Development and Deployment of Parallel Floating-Point Math Functionality on a System with Heterogeneous Hardware Components

ABSTRACT

System and method for configuring a system of heterogeneous hardware components, including at least one: programmable hardware element (PHE), digital signal processor (DSP) core, and programmable communication element (PCE). A program, e.g., a graphical program (GP), which includes floating point math functionality and which is targeted for distributed deployment on the system is created. Respective portions of the program for deployment to respective ones of the hardware components are automatically determined. Program code implementing communication functionality between the at least one PHE and the at least one DSP core and targeted for deployment to the at least one PCE is automatically generated. At least one hardware configuration program (HCP) is generated from the program and the code, including compiling the respective portions of the program and the program code for deployment to respective hardware components. The HCP is deployable to the system for concurrent execution of the program.

PRIORITY DATA

This application claims benefit of priority to U.S. ProvisionalApplication 61/828,769, titled “Development and Deployment of ParallelFloating-Point Math Functionality on a System with HeterogeneousHardware Components”, filed May 30, 2013, whose inventors were JeffreyL. Kodosky, Hugo A. Andrade, Brian Keith Odom, Cary Paul Butler, BrianC. MacCleery, James C. Nagle, J. Marcus Monroe, and Alexandre M. Barp,which is hereby incorporated by reference in its entirety as thoughfully and completely set forth herein.

RESERVATION OF COPYRIGHT

A portion of the disclosure of this patent document contains material towhich a claim of copyright protection is made. The copyright owner hasno objection to the facsimile reproduction by anyone of the patentdocument or the patent disclosure as it appears in the Patent andTrademark Office patent file or records, but reserves all other rightswhatsoever.

FIELD OF THE INVENTION

The present invention relates to the field of programming, and moreparticularly to development and deployment of parallel floating pointmath functionality on a system with heterogeneous hardware components,and global optimization of programs with floating point mathfunctionality.

DESCRIPTION OF THE RELATED ART

Traditionally, high level text-based programming languages have beenused by programmers in writing applications programs. Many differenthigh level programming languages exist, including BASIC, C, FORTRAN,Pascal, COBOL, ADA, APL, etc. Programs written in these high levellanguages are translated to the machine language level by translatorsknown as compilers. The high level programming languages in this level,as well as the assembly language level, are referred to as text-basedprogramming environments.

Increasingly computers are required to be used and programmed by thosewho are not highly trained in computer programming techniques. Whentraditional text-based programming environments are used, the user'sprogramming skills and ability to interact with the computer systemoften become a limiting factor in the achievement of optimal utilizationof the computer system.

There are numerous subtle complexities which a user must master beforehe can efficiently program a computer system in a text-basedenvironment. The task of programming a computer system to model aprocess often is further complicated by the fact that a sequence ofmathematical formulas, mathematical steps or other procedurescustomarily used to conceptually model a process often does not closelycorrespond to the traditional text-based programming techniques used toprogram a computer system to model such a process. In other words, therequirement that a user program in a text-based programming environmentplaces a level of abstraction between the user's conceptualization ofthe solution and the implementation of a method that accomplishes thissolution in a computer program. Thus, a user often must substantiallymaster different skills in order to both conceptually model a system andthen to program a computer to model that system. Since a user often isnot fully proficient in techniques for programming a computer system ina text-based environment to implement his model, the efficiency withwhich the computer system can be utilized to perform such modeling oftenis reduced.

Examples of fields in which computer systems are employed to modeland/or control physical systems are the fields of instrumentation,process control, and industrial automation. Computer modeling or controlof devices such as instruments or industrial automation hardware hasbecome increasingly desirable in view of the increasing complexity andvariety of instruments and devices available for use. However, due tothe wide variety of possible testing/control situations andenvironments, and also the wide array of instruments or devicesavailable, it is often necessary for a user to develop a program tocontrol a desired system. As discussed above, computer programs used tocontrol such systems had to be written in conventional text-basedprogramming languages such as, for example, assembly language, C,FORTRAN, BASIC, or Pascal. Traditional users of these systems, however,often were not highly trained in programming techniques and, inaddition, traditional text-based programming languages were notsufficiently intuitive to allow users to use these languages withouttraining. Therefore, implementation of such systems frequently requiredthe involvement of a programmer to write software for control andanalysis of instrumentation or industrial automation data. Thus,development and maintenance of the software elements in these systemsoften proved to be difficult.

U.S. Pat. No. 4,901,221 to Kodosky et al discloses a graphical systemand method for modeling a process, i.e. a graphical programmingenvironment which enables a user to easily and intuitively model aprocess. The graphical programming environment disclosed in Kodosky etal can be considered the highest and most intuitive way in which tointeract with a computer. A graphically based programming environmentcan be represented at level above text-based high level programminglanguages such as C, Pascal, etc. The method disclosed in Kodosky et alallows a user to construct a diagram using a block diagram editor, suchthat the diagram created graphically displays a procedure or method foraccomplishing a certain result, such as manipulating one or more inputvariables to produce one or more output variables. In response to theuser constructing a data flow diagram or graphical program using theblock diagram editor, machine language instructions are automaticallyconstructed which characterize an execution procedure which correspondsto the displayed procedure. Therefore, a user can create a computerprogram solely by using a graphically based programming environment.This graphically based programming environment may be used for creatingvirtual instrumentation systems, industrial automation systems andmodeling processes, as well as for any type of general programming.

Therefore, Kodosky et al teaches a graphical programming environmentwherein a user places on manipulates icons in a block diagram using ablock diagram editor to create a data flow “program.” A graphicalprogram for controlling or modeling devices, such as instruments,processes or industrial automation hardware, is referred to as a virtualinstrument (VI). In creating a virtual instrument, a user preferablycreates a front panel or user interface panel. The front panel includesvarious front panel objects, such as controls or indicators thatrepresent the respective input and output that will be used by thegraphical program or VI, and may include other icons which representdevices being controlled. When the controls and indicators are createdin the front panel, corresponding icons or terminals are automaticallycreated in the block diagram by the block diagram editor. Alternatively,the user can first place terminal icons in the block diagram which causethe display of corresponding front panel objects in the front panel. Theuser then chooses various functions that accomplish his desired result,connecting the corresponding function icons between the terminals of therespective controls and indicators. In other words, the user creates adata flow program, referred to as a block diagram, representing thegraphical data flow which accomplishes his desired function. This isdone by wiring up the various function icons between the control iconsand indicator icons. The manipulation and organization of icons in turnproduces machine language that accomplishes the desired method orprocess as shown in the block diagram.

A user inputs data to a virtual instrument using front panel controls.This input data propagates through the data flow block diagram orgraphical program and appears as changes on the output indicators. In aninstrumentation application, the front panel can be analogized to thefront panel of an instrument. In an industrial automation applicationthe front panel can be analogized to the MMI (Man Machine Interface) ofa device. The user adjusts the controls on the front panel to affect theinput and views the output on the respective indicators.

Thus, graphical programming has become a powerful tool available toprogrammers. Graphical programming environments such as the NationalInstruments LabVIEW product have become very popular. Tools such asLabVIEW have greatly increased the productivity of programmers, andincreasing numbers of programmers are using graphical programmingenvironments to develop their software applications. In particular,graphical programming tools are being used for test and measurement,data acquisition, process control, man machine interface (MMI), andsupervisory control and data acquisition (SCADA) applications, amongothers.

A primary goal of virtual instrumentation is to provide the user themaximum amount of flexibility to create his/her own applications and/ordefine his/her own instrument functionality. In this regard, it isdesirable to extend the level at which the user of instrumentation orindustrial automation hardware is able to program instrument. Theevolution of the levels at which the user has been able to program aninstrument is essentially as follows.

-   -   1. User level software (LabVIEW, LabWindows CVI, Visual Basic,        etc.)    -   2. Kernel level software    -   3. Auxiliary kernel level software (a second kernel running        along side the main OS, e.g., InTime, VentureCom, etc.)    -   4. Embedded kernel level software (see, e.g., U.S. Pat. No.        6,173,438, referenced herein)    -   5. Hardware level software

In general, going down the above list, the user is able to createsoftware applications which provide a more deterministic real-timeresponse. Currently, most programming development tools forinstrumentation or industrial automation provide an interface at level 1above. In general, most users are unable and/or not allowed to programat the kernel level or auxiliary kernel level. The user level softwaretypically takes the form of software tools that can be used to createsoftware which operates at levels 1 and/or 4.

Many instrumentation solutions at level 5 primarily exist asvendor-defined solutions, i.e., vendor created modules. In contrast, theLabVIEW FPGA™ (field programmable gate array) development environment,provided by National Instruments Corporation, provides the user with theability to develop user level software which operates at the hardwarelevel. More particularly, it provides the user with the ability todevelop high level software, such as graphical programs, which can thenbe readily converted into hardware level instrument functionality viaimplementation on an FPGA, thus providing the user with the dualbenefits of being able to program instrument functionality at thehighest level possible (text-based or graphical programs), while alsoproviding the ability to have the created program operate directly inhardware for increased speed and efficiency.

Increasingly, complex functionality that was once implemented viamultiple different devices or dedicated chips is implemented on a singlechip, referred to as an SOC (System-On-Chip). Such chips may includevarious types of components, e.g., FPGAs, DSP (digital signal processor)cores, microprocessors, and so forth, that may operate in conjunction,e.g., in a parallel or concurrent manner. In current developmentsystems, for a program targeted for deployment on such chips, the useris required to explicitly specify which portions of the program are tobe deployed to which components of the chip, and must generally designsuch partitioning into the program, which is complex, difficult,tedious, and error prone.

SUMMARY OF THE INVENTION

The present invention comprises a computer-implemented system and methodfor automatically generating hardware level functionality, e.g.,parallel system-on-chip (SOC) hardware implementations, includingtargeting and implementation of floating point math functionality onprogrammable hardware elements, e.g., programmable hardware or FPGAfabric, and other parallel heterogeneous hardware components, e.g., DSPcores, microprocessors, graphics processing units (GPUs), and so forth,integrated via various programmable communication elements (PCEs). Thehardware implementation on such heterogeneous hardware components isgenerated based on a program, e.g., a graphical and/or textual program,created by a user. This provides the user the ability to develop ordefine instrument functionality using various programming techniques,e.g., graphical programming techniques, while enabling the resultingprogram to operate directly in hardware. It should be noted that thetechniques disclosed herein are broadly applicable to a variety of typesof programs, e.g., graphical programs, textual programs, or programsthat include both graphical and textual program code. Embodiments of theinvention disclosed herein are primarily described and illustrated interms of graphical programs, e.g., LabVIEW programs, but should not beconsidered to restrict the embodiments contemplated to any particulartype of program. Thus, for example, methods described in terms ofgraphical programs are also intended to be applicable to textualprograms and/or combinations of the two.

In one embodiment, a program (e.g., graphical, textual, or both) thatincludes floating point math functionality may be created. The programmay be targeted for distributed deployment on a system comprisingheterogeneous hardware components, including, but not limited to, atleast one programmable hardware element, at least one DSP core, and atleast one programmable communication element (PCE).

In one embodiment, the user may first create the program, e.g., agraphical or textual program, which performs or represents the desiredfunctionality. In graphical program implementations, the program willtypically include one or more modules or a hierarchy of sub-VIs.Similarly, in textual program implementations, the program may include ahierarchy of functions or subprograms. In some embodiments, the user mayplace various constructs in portions of the (e.g., graphical) program toaid in conversion of these portions into hardware form. However, inother embodiments, the conversion process may be fully automatic, asdescribed herein.

Respective portions of the program for respective deployment torespective ones of the heterogeneous hardware components may beautomatically determined, including determining respective executiontiming for the respective portions. In one embodiment, the respectiveportions may include a first portion targeted for deployment to the atleast one programmable hardware element, and a second portion targetedfor deployment to the at least one DSP core.

First program code implementing communication functionality (includingtiming functionality, possibly with constraints) between the at leastone programmable hardware element and the at least one DSP core may beautomatically generated. The first program code may be targeted fordeployment to or on the at least one communication element.

The method may also include automatically generating at least onehardware configuration program from the program and the first programcode, including compiling the respective portions of the program and thefirst program code for deployment to respective ones of theheterogeneous hardware components. Thus, for example, the first portionof the program may be compiled for deployment to the at least oneprogrammable hardware element, thereby generating a first portion of theat least one hardware configuration program, the second portion of theprogram may be compiled for deployment to the at least one DSP core,thereby generating a second portion of the at least one hardwareconfiguration program, and the automatically generated first programcode implementing communication functionality may be compiled fordeployment to the at least one communication element, thereby generatinga third portion of the at least one hardware configuration program.

The hardware configuration program may be deployable to the system,where after deployment, the system may be configured to execute theprogram concurrently, including the floating point math functionality.Thus, for example, in one embodiment, deploying the at least onehardware configuration program may include configuring the at least oneprogrammable hardware element with the first portion of the at least onehardware configuration program, configuring the at least one DSP corewith the second portion of the at least one hardware configurationprogram, and configuring the at least one communication element with thethird portion of the at least one hardware configuration program.Accordingly, during execution the at least one programmable hardwareelement performs the functionality of the first portion of the program,the at least one DSP core performs the functionality of the secondportion of the program, and the at least one communication elementimplements communication between the at least one programmable hardwareelement and the at least one DSP core. In other words, the at least onehardware configuration program may be used to configure the system toimplement the functionality of the program (including the floating pointmath functionality), after which the system may be operable to performthe respective functionality via the heterogeneous hardware componentsconcurrently.

In some embodiments, the hardware configuration program may be directlyconverted into a hardware configuration program, e.g., an FPGA programfile, describing a plurality of computing elements, including, forexample, but not limited to, one or more of: fixed point FPGA fabric,floating point FPGA fabric, DSP cores, soft or hardcore microprocessors,graphics processing units (GPUs), or other heterogeneous computingelements which are integrated in one heterogeneous or homogenous chip orchipset or multiple heterogeneous or homogenous chipsets.

The above techniques may also be applied to real-time or faster thanreal-time simulation, as well as global optimization of system designsvia such simulation.

BRIEF DESCRIPTION OF THE DRAWINGS

A better understanding of the present invention can be obtained when thefollowing detailed description of embodiments is considered inconjunction with the following drawings, in which:

FIG. 1A illustrates an exemplary instrumentation control system,according to one embodiment;

FIG. 1B illustrates an exemplary industrial automation system, accordingto one embodiment;

FIG. 2 is a block diagram of the exemplary computer system of FIGS. 1and 1B, according to one embodiment;

FIG. 3 is a flowchart diagram of a method for developing and deploying aprogram, e.g., a graphical program, to a system of heterogeneoushardware components, according to one embodiment;

FIG. 4A illustrates an exemplary heterogeneous system-on-chip (SOC),according to one embodiment;

FIG. 4B illustrates an exemplary heterogeneous system implemented onmultiple chips, according to one embodiment;

FIGS. 5A and 5B are block diagrams illustrating exemplary interfacecards configured with heterogeneous programmable hardware, according tovarious embodiments of the present invention;

FIG. 6 is a high level flowchart diagram illustrating conversion of aprogram to a heterogeneous hardware implementation, according to oneembodiment;

FIG. 7 is a more detailed flowchart diagram illustrating conversion of aprogram to a heterogeneous hardware implementation, including compilinga first portion of the program into machine language and converting asecond portion of the program into a heterogeneous hardwareimplementation;

FIG. 8 is a flowchart diagram illustrating creation of a graphicalprogram, according to one embodiment;

FIG. 9 is a flowchart diagram illustrating exporting at least a portionof a graphical program to a hardware description, according to oneembodiment;

FIG. 10 is a flowchart diagram illustrating exporting a floating pointinput terminal into a heterogeneous hardware description, according toone embodiment;

FIG. 11 is a flowchart diagram illustrating exporting floating pointfunction nodes into a heterogeneous hardware description, according toone embodiment;

FIG. 12 is a flowchart diagram illustrating exporting a floating pointoutput terminal into a heterogeneous hardware description, according toone embodiment;

FIG. 13 is a flowchart diagram illustrating exporting a structure nodeinto a heterogeneous hardware description, according to one embodiment;

FIG. 14 illustrates converting a node heterogeneous hardware descriptionto a net list, according to one embodiment;

FIG. 15 illustrates converting a structure node hardware description toa net list, according to one embodiment;

FIG. 16 illustrates the floating point function block for a structurenode implemented in heterogeneous hardware components, according to oneembodiment;

FIG. 17 is a state diagram illustrating operation of the structure nodefunction block of FIG. 16, according to one embodiment;

FIG. 18 illustrates an exemplary simple graphical program, according toone embodiment;

FIG. 19 is a conceptual diagram of the heterogeneous hardwaredescription of the graphical program of FIG. 18 and communicationmechanisms between heterogeneous hardware components, according to oneembodiment;

FIG. 20 illustrates another exemplary graphical program, according toone embodiment;

FIG. 21 illustrates a tree of floating point data structures created inresponse to the graphical program of FIG. 20, and is a conceptualdiagram of the heterogeneous hardware description of the graphicalprogram of FIG. 20, according to one embodiment.

FIG. 22 is a circuit diagram of the heterogeneous hardwareimplementation of the mixed floating- and fixed-point graphical programof FIG. 20;

FIGS. 23-25 are graphical source code listings of an exemplary graphicalprogram, according to one embodiment; and

FIG. 26 illustrates an exemplary circuit design suitable for globaloptimization via embodiments of the techniques disclosed.

While the invention is susceptible to various modifications andalternative forms, specific embodiments thereof are shown by way ofexample in the drawings and are herein described in detail. It should beunderstood, however, that the drawings and detailed description theretoare not intended to limit the invention to the particular formdisclosed, but on the contrary, the intention is to cover allmodifications, equivalents and alternatives falling within the spiritand scope of the present invention as defined by the appended claims.

DETAILED DESCRIPTION OF THE INVENTION Incorporation by Reference

The following references are hereby incorporated by reference in theirentirety as though fully and completely set forth herein:

U.S. Provisional Application 61/828,769, titled “Development andDeployment of Parallel Floating-Point Math Functionality on a Systemwith Heterogeneous Hardware Components”, filed May 30, 2013.

U.S. patent application Ser. No. 13/347,880, titled “Co-Simulation withPeer Negotiated Time Steps”, filed Jan. 11, 2012.

U.S. patent application Ser. No. 12/752,606, titled “Race Structure fora Graphical Program”, filed Apr. 1, 2010.

U.S. patent application Ser. No. 12/577,284, titled “AsynchronousPreemptive Edit Time Semantic Analysis of a Graphical Program”, filedOct. 12, 2009.

U.S. patent application Ser. No. 12/720,966, titled “Multi-Touch Editingin a Graphical Programming Language”, filed Mar. 10, 2010.

U.S. patent application Ser. No. 12/707,824, titled “AutomaticallySuggesting Graphical Program Elements for Inclusion in a GraphicalProgram”, filed Feb. 18, 2010.

U.S. Pat. No. 4,901,221 titled “Graphical System for Modeling a Processand Associated Method,” issued on Feb. 13, 1990.

U.S. Pat. No. 4,914,568 titled “Graphical System for Modeling a Processand Associated Method,” issued on Apr. 3, 1990.

U.S. Pat. No. 5,481,741 titled “Method and Apparatus for ProvidingAttribute Nodes in a Graphical Data Flow Environment”.

U.S. Pat. No. 5,734,863, titled “Method and Apparatus for ProvidingImproved Type Compatibility and Data Structure Organization in aGraphical Data Flow Diagram”.

U.S. Pat. No. 5,475,851 titled “Method and Apparatus for Improved Localand Global Variable Capabilities in a Graphical Data Flow Program”.

U.S. Pat. No. 5,497,500 titled “Method and Apparatus for More EfficientFunction Synchronization in a Data Flow Program”.

U.S. Pat. No. 5,821,934, titled “Method and Apparatus for ProvidingStricter Data Type Capabilities in a Graphical Data Flow Environment”.

U.S. Pat. No. 5,481,740 titled “Method and Apparatus for ProvidingAutoprobe Features in a Graphical Data Flow Diagram”.

U.S. Pat. No. 5,974,254, titled “System and Method for DetectingDifferences in Graphical Programs” filed Jun. 6, 1997.

U.S. Pat. No. 6,173,438, titled “Embedded Graphical Programming System”filed Aug. 18, 1997.

U.S. Pat. No. 6,219,628, titled “System and Method for ConvertingGraphical Programs Into Hardware Implementations”.

U.S. Pat. No. 7,987,448, titled “Conversion of a first diagram havingstates and transitions to a graphical data flow program using anintermediate XML representation”.

U.S. Pat. No. 7,882,445, titled “Configurable Wires in a Statechart”.

U.S. Pat. No. 8,214,796, titled “Event Firing Node for AsynchronouslyPassing Events from a Graphical Data Flow Program to a Statechart”.

U.S. Pat. No. 8,151,244, titled “Merging graphical programs based on anancestor graphical program”.

U.S. Pat. No. 8,204,925, titled “Controlling or Analyzing a Process bySolving a System of Linear Equations in Real-time”.

U.S. Pat. No. 8,239,824, titled “Developing a Graphical Data FlowProgram with Multiple Models of Computation in a Web Browser”.

U.S. Pat. No. 7,992,129, titled “System and method for programmaticallygenerating a graphical program based on a sequence of motion control,machine vision, and data acquisition (DAQ) operations”.

U.S. Pat. No. 7,996,782, titled “Data transfer indicator icon in adiagram”.

U.S. Pat. No. 8,050,882, titled “Network-based System for AutomaticallyGenerating a Graphical Program Based on User Supplied Measurement TaskRequirements”.

U.S. Pat. No. 8,055,738, titled “Automatically Generating aConfiguration Diagram Based on Task Requirements”.

U.S. Pat. No. 8,074,203, titled “Graphical Program Execution withDistributed Block Diagram Display”.

U.S. Pat. No. 8,099,712, titled “Generating a Hardware Description Basedon a Diagram with States and State Transitions”.

U.S. Pat. No. 8,108,833, titled “Automatically Generating a GraphicalData Flow Program from a Statechart”.

U.S. Pat. No. 8,146,050, titled “Graphical Program with PhysicalSimulation and Data Flow Portions”.

U.S. Pat. No. 8,185,834, titled “User-Defined Events for a GraphicalProgramming Environment”.

U.S. Pat. No. 8,204,951, titled “Deterministic Communication BetweenGraphical Programs Executing on Different Computer Systems UsingVariable Nodes”.

U.S. Pat. No. 8,239,158, titled “Synchronizing a Loop Performed by aMeasurement Device with a Measurement and Control Loop Performed by aProcessor of a Host Computer”.

U.S. Pat. No. 8,205,161, titled “Graphical Programming System withEvent-Handling Nodes”.

U.S. Pat. No. 8,214,796, titled “Event Firing Node for AsynchronouslyPassing Events from a Graphical Data Flow Program to a Statechart”.

U.S. Pat. No. 8,239,848, titled “Incremental Deployment and Execution ofa Program on an Embedded Device”.

U.S. Pat. No. 8,239,177, titled “Simulation of a Motion System Includinga Mechanical Modeler with Interpolation”.

U.S. Pat. No. 8,205,162, titled “Execution Contexts for a GraphicalProgram”.

U.S. Pat. No. 8,146,05, titled “Graphical Programming Environment withFirst Model of Computation that Includes a Structure Supporting SecondModel of Computation”.

U.S. Pat. No. 8,205,188, titled “Automatically Generating a SecondGraphical Program Based on a First Graphical Program”.

U.S. Pat. No. 7,568,178, titled “System Simulation and Graphical DataFlow Programming in a Common Environment Using Wire Data Flow”.

U.S. Pat. No. 8,074,201, titled “Deployment and Execution of a Programon an Embedded Device”.

U.S. Pat. No. 8,037,369, titled “Error Handling Structure For Use in aGraphical Program”.

The above-referenced patents and patent applications disclose variousaspects of the LabVIEW graphical programming and development system.

The LabVIEW and BridgeVIEW graphical programming manuals, including the“G Programming Reference Manual”, available from National InstrumentsCorporation, are also hereby incorporated by reference in theirentirety.

TERMS

The following is a glossary of terms used in the present application:

Memory Medium—Any of various types of memory devices or storage devices.The term “memory medium” is intended to include an installation medium,e.g., a CD-ROM, floppy disks 104, or tape device; a computer systemmemory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM,Rambus RAM, etc.; a non-volatile memory such as a Flash, magnetic media,e.g., a hard drive, or optical storage; registers, or other similartypes of memory elements, etc. The memory medium may comprise othertypes of memory as well or combinations thereof. In addition, the memorymedium may be located in a first computer in which the programs areexecuted, or may be located in a second different computer whichconnects to the first computer over a network, such as the Internet. Inthe latter instance, the second computer may provide programinstructions to the first computer for execution. The term “memorymedium” may include two or more memory mediums which may reside indifferent locations, e.g., in different computers that are connectedover a network.

Carrier Medium—a memory medium as described above, as well as a physicaltransmission medium, such as a bus, network, and/or other physicaltransmission medium that conveys signals such as electrical,electromagnetic, or digital signals.

Programmable Hardware Element—includes various hardware devicescomprising multiple programmable function blocks connected via aprogrammable interconnect. Examples include FPGAs (Field ProgrammableGate Arrays), PLDs (Programmable Logic Devices), FPOAs (FieldProgrammable Object Arrays), and CPLDs (Complex PLDs). The programmablefunction blocks may range from fine grained (combinatorial logic or lookup tables) to coarse grained (arithmetic logic units or processorcores). A programmable hardware element may also be referred to as“reconfigurable logic”.

Software Program—the term “software program” is intended to have thefull breadth of its ordinary meaning, and includes any type of programinstructions, code, script and/or data, or combinations thereof, thatmay be stored in a memory medium and executed by a processor. Exemplarysoftware programs include programs written in text-based programminglanguages, such as C, C++, PASCAL, FORTRAN, COBOL, JAVA, assemblylanguage, etc.; graphical programs (programs written in graphicalprogramming languages); assembly language programs; programs that havebeen compiled to machine language; scripts; and other types ofexecutable software. A software program may comprise two or moresoftware programs that interoperate in some manner. Note that variousembodiments described herein may be implemented by a computer orsoftware program. A software program may be stored as programinstructions on a memory medium.

Hardware Configuration Program—a program, e.g., a netlist or bit file,that can be used to program or configure a programmable hardwareelement.

Program—the term “program” is intended to have the full breadth of itsordinary meaning. The term “program” includes 1) a software programwhich may be stored in a memory and is executable by a processor or 2) ahardware configuration program useable for configuring a programmablehardware element.

Graphical Program—A program comprising a plurality of interconnectednodes or icons, wherein the plurality of interconnected nodes or iconsvisually indicate functionality of the program. The interconnected nodesor icons are graphical source code for the program. Graphical functionnodes may also be referred to as blocks.

The following provides examples of various aspects of graphicalprograms. The following examples and discussion are not intended tolimit the above definition of graphical program, but rather provideexamples of what the term “graphical program” encompasses:

The nodes in a graphical program may be connected in one or more of adata flow, control flow, and/or execution flow format. The nodes mayalso be connected in a “signal flow” format, which is a subset of dataflow.

Exemplary graphical program development environments which may be usedto create graphical programs include LabVIEW®, DasyLab™, DiaDem™ andMatrixx/SystemBuild™ from National Instruments, Simulink® from theMathWorks, VEE™ from Agilent, WiT™ from Coreco, Vision Program Manager™from PPT Vision, SoftWIRE™ from Measurement Computing, Sanscript™ fromNorthwoods Software, Khoros™ from Khoral Research, SnapMaster™ from HEMData, VisSim™ from Visual Solutions, ObjectBench™ by SES (Scientific andEngineering Software), and VisiDAQ™ from Advantech, among others.

The term “graphical program” includes models or block diagrams createdin graphical modeling environments, wherein the model or block diagramcomprises interconnected blocks (i.e., nodes) or icons that visuallyindicate operation of the model or block diagram; exemplary graphicalmodeling environments include Simulink®, SystemBuild™, VisSim™,Hypersignal Block Diagram™, etc.

A graphical program may be represented in the memory of the computersystem as data structures and/or program instructions. The graphicalprogram, e.g., these data structures and/or program instructions, may becompiled or interpreted to produce machine language that accomplishesthe desired method or process as shown in the graphical program.

Input data to a graphical program may be received from any of varioussources, such as from a device, unit under test, a process beingmeasured or controlled, another computer program, a database, or from afile. Also, a user may input data to a graphical program or virtualinstrument using a graphical user interface, e.g., a front panel.

A graphical program may optionally have a GUI associated with thegraphical program. In this case, the plurality of interconnected blocksor nodes are often referred to as the block diagram portion of thegraphical program.

Node—In the context of a graphical program, an element that may beincluded in a graphical program. The graphical program nodes (or simplynodes) in a graphical program may also be referred to as blocks. A nodemay have an associated icon that represents the node in the graphicalprogram, as well as underlying code and/or data that implementsfunctionality of the node. Exemplary nodes (or blocks) include functionnodes, sub-program nodes, terminal nodes, structure nodes, etc. Nodesmay be connected together in a graphical program by connection icons orwires.

Data Flow Program—A Software Program in which the program architectureis that of a directed graph specifying the flow of data through theprogram, and thus functions execute whenever the necessary input dataare available. Data flow programs can be contrasted with proceduralprograms, which specify an execution flow of computations to beperformed. As used herein “data flow” or “data flow programs” refer to“dynamically-scheduled data flow” and/or “statically-defined data flow”.

Graphical Data Flow Program (or Graphical Data Flow Diagram)—A GraphicalProgram which is also a Data Flow Program. A Graphical Data Flow Programcomprises a plurality of interconnected nodes (blocks), wherein at leasta subset of the connections among the nodes visually indicate that dataproduced by one node is used by another node. A LabVIEW VI is oneexample of a graphical data flow program. A Simulink block diagram isanother example of a graphical data flow program.

Graphical User Interface—this term is intended to have the full breadthof its ordinary meaning. The term “Graphical User Interface” is oftenabbreviated to “GUI”. A GUI may comprise only one or more input GUIelements, only one or more output GUI elements, or both input and outputGUI elements.

The following provides examples of various aspects of GUIs. Thefollowing examples and discussion are not intended to limit the ordinarymeaning of GUI, but rather provide examples of what the term “graphicaluser interface” encompasses:

A GUI may comprise a single window having one or more GUI Elements, ormay comprise a plurality of individual GUI Elements (or individualwindows each having one or more GUI Elements), wherein the individualGUI Elements or windows may optionally be tiled together.

A GUI may be associated with a graphical program. In this instance,various mechanisms may be used to connect GUI Elements in the GUI withnodes in the graphical program. For example, when Input Controls andOutput Indicators are created in the GUI, corresponding nodes (e.g.,terminals) may be automatically created in the graphical program orblock diagram. Alternatively, the user can place terminal nodes in theblock diagram which may cause the display of corresponding GUI Elementsfront panel objects in the GUI, either at edit time or later at runtime. As another example, the GUI may comprise GUI Elements embedded inthe block diagram portion of the graphical program.

Front Panel—A Graphical User Interface that includes input controls andoutput indicators, and which enables a user to interactively control ormanipulate the input being provided to a program, and view output of theprogram, while the program is executing.

A front panel is a type of GUI. A front panel may be associated with agraphical program as described above.

In an instrumentation application, the front panel can be analogized tothe front panel of an instrument. In an industrial automationapplication the front panel can be analogized to the MMI (Man MachineInterface) of a device. The user may adjust the controls on the frontpanel to affect the input and view the output on the respectiveindicators.

Graphical User Interface Element—an element of a graphical userinterface, such as for providing input or displaying output. Exemplarygraphical user interface elements comprise input controls and outputindicators.

Input Control—a graphical user interface element for providing userinput to a program. An input control displays the value input by theuser and is capable of being manipulated at the discretion of the user.Exemplary input controls comprise dials, knobs, sliders, input textboxes, etc.

Output Indicator—a graphical user interface element for displayingoutput from a program. Exemplary output indicators include charts,graphs, gauges, output text boxes, numeric displays, etc. An outputindicator is sometimes referred to as an “output control”.

Computer System—any of various types of computing or processing systems,including a personal computer system (PC), mainframe computer system,workstation, network appliance, Internet appliance, personal digitalassistant (PDA), television system, grid computing system, or otherdevice or combinations of devices. In general, the term “computersystem” can be broadly defined to encompass any device (or combinationof devices) having at least one processor that executes instructionsfrom a memory medium.

Measurement Device—includes instruments, data acquisition devices, smartsensors, and any of various types of devices that are configured toacquire and/or store data. A measurement device may also optionally befurther configured to analyze or process the acquired or stored data.Examples of a measurement device include an instrument, such as atraditional stand-alone “box” instrument, a computer-based instrument(instrument on a card) or external instrument, a data acquisition card,a device external to a computer that operates similarly to a dataacquisition card, a smart sensor, one or more DAQ or measurement cardsor modules in a chassis, an image acquisition device, such as an imageacquisition (or machine vision) card (also called a video capture board)or smart camera, a motion control device, a robot having machine vision,and other similar types of devices. Exemplary “stand-alone” instrumentsinclude oscilloscopes, multimeters, signal analyzers, arbitrary waveformgenerators, spectroscopes, and similar measurement, test, or automationinstruments.

A measurement device may be further configured to perform controlfunctions, e.g., in response to analysis of the acquired or stored data.For example, the measurement device may send a control signal to anexternal system, such as a motion control system or to a sensor, inresponse to particular data. A measurement device may also be configuredto perform automation functions, i.e., may receive and analyze data, andissue automation control signals in response.

Functional Unit (or Processing Element)—refers to various elements orcombinations of elements. Processing elements include, for example,circuits such as an ASIC (Application Specific Integrated Circuit),portions or circuits of individual processor cores, entire processorcores, individual processors, programmable hardware devices such as afield programmable gate array (FPGA), and/or larger portions of systemsthat include multiple processors, as well as any combinations thereof.

Automatically—refers to an action or operation performed by a computersystem (e.g., software executed by the computer system) or device (e.g.,circuitry, programmable hardware elements, ASICs, etc.), without userinput directly specifying or performing the action or operation. Thusthe term “automatically” is in contrast to an operation being manuallyperformed or specified by the user, where the user provides input todirectly perform the operation. An automatic procedure may be initiatedby input provided by the user, but the subsequent actions that areperformed “automatically” are not specified by the user, i.e., are notperformed “manually”, where the user specifies each action to perform.For example, a user filling out an electronic form by selecting eachfield and providing input specifying information (e.g., by typinginformation, selecting check boxes, radio selections, etc.) is fillingout the form manually, even though the computer system must update theform in response to the user actions. The form may be automaticallyfilled out by the computer system where the computer system (e.g.,software executing on the computer system) analyzes the fields of theform and fills in the form without any user input specifying the answersto the fields. As indicated above, the user may invoke the automaticfilling of the form, but is not involved in the actual filling of theform (e.g., the user is not manually specifying answers to fields butrather they are being automatically completed). The presentspecification provides various examples of operations beingautomatically performed in response to actions the user has taken.

Concurrent—refers to parallel execution or performance, where tasks,processes, or programs are performed in an at least partiallyoverlapping manner. For example, concurrency may be implemented using“strong” or strict parallelism, where tasks are performed (at leastpartially) in parallel on respective computational elements, or using“weak parallelism”, where the tasks are performed in an interleavedmanner, e.g., by time multiplexing of execution threads.

FIG. 1A—Exemplary Instrumentation Control System

FIG. 1A illustrates an exemplary instrumentation control system 100which may implement embodiments of the invention. The system 100comprises a host computer 82 which couples to one or more instruments.The host computer 82 may comprise a CPU, a display screen, memory, andone or more input devices such as a mouse or keyboard as shown. Thecomputer system 82 may include at least one memory medium on which oneor more computer programs or software components according to oneembodiment of the present invention may be stored. For example, thememory medium may store one or more graphical programs which areexecutable to perform the methods described herein. Additionally, thememory medium may store a graphical (or textual) programming developmentenvironment application used to create, deploy, and/or implement orexecute such graphical (or textual) programs on heterogeneous hardwaresystems, i.e., systems with heterogeneous hardware components, e.g.,including one or more of the instruments shown in FIG. 1A. The memorymedium may also store operating system software, as well as othersoftware for operation of the computer system. Various embodimentsfurther include receiving or storing instructions and/or dataimplemented in accordance with the foregoing description upon a carriermedium. The computer 82 may operate with the one or more instruments toanalyze, measure or control a unit under test (UUT) or process 150,e.g., via execution of software 104.

The one or more instruments may include a GPIB instrument 112 andassociated GPIB interface card 122, a data acquisition board 114inserted into or otherwise coupled with chassis 124 with associatedsignal conditioning circuitry 126, a VXI instrument 116, a PXIinstrument 118, a video device or camera 132 and associated imageacquisition (or machine vision) card 134, a motion control device 136and associated motion control interface card 138, and/or one or morecomputer based instrument cards 142, among other types of devices. Thecomputer system may couple to and operate with one or more of theseinstruments. The instruments may be coupled to the unit under test (UUT)or process 150, or may be coupled to receive field signals, typicallygenerated by transducers. The system 100 may be used in a dataacquisition and control application, in a test and measurementapplication, an image processing or machine vision application, aprocess control application, a man-machine interface application, asimulation application, or a hardware-in-the-loop validationapplication, among others.

FIG. 1B—Exemplary Automation System

FIG. 1B illustrates an exemplary industrial automation system 200 whichmay implement embodiments of the invention. The industrial automationsystem 200 is similar to the instrumentation or test and measurementsystem 100 shown in FIG. 1A. Elements which are similar or identical toelements in FIG. 1A have the same reference numerals for convenience.The system 200 may comprise a computer 82 which couples to one or moredevices or instruments. The computer 82 may comprise a CPU, a displayscreen, memory, and one or more input devices such as a mouse orkeyboard as shown. The computer 82 may operate with the one or moredevices to perform an automation function with respect to a process ordevice 150, such as MMI (Man Machine Interface), SCADA (SupervisoryControl and Data Acquisition), portable or distributed data acquisition,process control, advanced analysis, or other control, among others,e.g., via execution of software 104.

The one or more devices may include a data acquisition board 114inserted into or otherwise coupled with chassis 124 with associatedsignal conditioning circuitry 126, a PXI instrument 118, a video device132 and associated image acquisition card 134, a motion control device136 and associated motion control interface card 138, a fieldbus device270 and associated fieldbus interface card 172, a PLC (ProgrammableLogic Controller) 176, a serial instrument 282 and associated serialinterface card 184, or a distributed data acquisition system, such asthe Fieldpoint system available from National Instruments, among othertypes of devices.

Note that in the exemplary systems of FIGS. 1A and 1B, one or more ofthe devices connected to the computer 82 may include programmablehardware according to the present invention. In some embodiments, theprogrammable hardware includes at least one programmable hardwareelement, e.g., an FPGA (field programmable gate array), an SOC(system-on-chip), or other heterogeneous computing devices containingresources capable of parallel execution. In some embodiments, theprogrammable hardware may be or include an FPGA fabric. As will bedescribed below in detail, a program, such as a graphical (or textual)program, with floating point math functionality may be implemented inhardware with communication mechanisms between computing heterogeneouselements, which in some embodiments may be located in one or more SOCsor other computing devices, and the hardware components may be networkedtogether locally or remotely, where computation by the components may besynchronized to achieve desired execution timing and parallelization ofthe respective computing tasks.

The instruments or devices in FIGS. 1A and 1B may be controlled bygraphical software programs, optionally a portion of which execute onthe CPU of the computer 82, and at least a portion of which may bedownloaded (deployed) to the programmable hardware for hardwareexecution.

In one embodiment, the computer system 82 itself may include aheterogeneous system as described herein, e.g., on an expansion card orconnected device. Note, however, that in various embodiments, theconfigured (via embodiments disclosed herein) heterogeneous system maybe implemented or included in any type of devices desired.

Moreover, although in some embodiments the programs and programmablehardware may be involved with data acquisition/generation, analysis,and/or display, and/or for controlling or modeling instrumentation orindustrial automation hardware, it is noted that the present inventioncan be used to create hardware implementations of programs for aplethora of applications and are not limited to instrumentation orindustrial automation applications. In other words, the systems of FIGS.1A and 1B are exemplary only, and the present invention may be used inany of various types of systems. Thus, the systems and methods of thepresent invention are operable for automatically creating hardwareimplementations of programs or graphical (or textual) code for any ofvarious types of applications, including general purpose softwareapplications such as word processing, spreadsheets, network control,games, etc.

Exemplary Systems

Embodiments of the present invention may be involved with performingtest and/or measurement functions; controlling and/or modelinginstrumentation or industrial automation hardware; modeling andsimulation functions, e.g., modeling or simulating a device or productbeing developed or tested, etc. Exemplary test applications where theprogram may be used include hardware-in-the-loop testing and rapidcontrol prototyping, among others. More generally, in variousembodiments, the heterogeneous system may be used in any type ofapplication desired, e.g., in real-time, faster-than-real-time andslower-than-real-time simulation, digital signal processing, algorithms,mathematics, optimization and search, among others. For example, in oneembodiment, the techniques disclosed herein may be applied to the fieldof system simulation, e.g., simulation of a system such as a circuit,electric power grid, motor, generator, communication network or othercomplex physical system. The program(s) implemented and processed perthe techniques described may further be directed to any of a pluralityof execution contexts for desktop or real-time computer targets.

However, it is noted that embodiments of the present invention can beused for a plethora of applications and is not limited to the aboveapplications. In other words, applications discussed in the presentdescription are exemplary only, and embodiments of the present inventionmay be used in any of various types of systems. Thus, embodiments of thesystem and method of the present invention is configured to be used inany of various types of applications, including the control of othertypes of devices such as multimedia devices, video devices, audiodevices, telephony devices, Internet devices, etc., as well as generalpurpose software applications such as word processing, spreadsheets,network control, network monitoring, financial applications, games, etc.Further applications contemplated include hardware-in-the-loop testingand simulation, and rapid control prototyping, among others.

It should also be noted that some embodiments of the methods disclosedherein may be performed or implemented on a computer, such as computer82, that is not connected to instrumentation or automation devices (asexemplified in FIGS. 1A and 1B), where the method may produce one ormore products, such as a hardware configuration program, that may besubsequently used by the computer 82 or conveyed to another computingdevice for use, e.g., to configure a heterogeneous system.

In the embodiments of FIGS. 1A and 1B above, one or more of the variousdevices may couple to each other over a network, such as the Internet.In one embodiment, the user operates to select a target device from aplurality of possible target devices for programming or configurationusing a program, e.g., a graphical program. Thus the user may create aprogram on a computer and use (execute) the program on that computer ordeploy the program to a target device (for remote execution on thetarget device) that is remotely located from the computer and coupled tothe computer through a network.

Graphical software programs which perform data acquisition, analysisand/or presentation, e.g., for measurement, instrumentation control,industrial automation, modeling, or simulation, such as in theapplications shown in FIGS. 1A and 1B, may be referred to as virtualinstruments.

FIG. 2—Computer System Block Diagram

FIG. 2 is a block diagram representing one embodiment of the computersystem 82 illustrated in FIGS. 1A and 1B. It is noted that any type ofcomputer system configuration or architecture can be used as desired,and FIG. 2 illustrates a representative PC embodiment. It is also notedthat the computer system may be a general purpose computer system, acomputer implemented on a card installed in a chassis, or other types ofembodiments. Elements of a computer not necessary to understand thepresent description have been omitted for simplicity.

The computer may include at least one central processing unit or CPU(processor) 160 which is coupled to a processor or host bus 162. The CPU160 may be any of various types, including an x86 processor, e.g., aPentium class, a PowerPC processor, a CPU from the SPARC family of RISCprocessors, an ARM processor, a GPU processor, as well as others. Amemory medium, typically comprising RAM and referred to as main memory,166 is coupled to the host bus 162 by means of memory controller 164.The main memory 166 may store a programming system, and may also storesoftware for converting at least a portion of a program into a hardwareimplementation. This software will be discussed in more detail below.The main memory may also store operating system software, as well asother software for operation of the computer system.

The host bus 162 may be coupled to an expansion or input/output bus 170by means of a bus controller 168 or bus bridge logic. The expansion bus170 may be the PCI (Peripheral Component Interconnect) expansion bus,although other bus types can be used. The expansion bus 170 includesslots for various devices such as described above. In the exemplaryembodiment shown, the computer 82 further comprises a video displaysubsystem 180 and hard drive 182 coupled to the expansion bus 170, aswell as a communication bus 183. The computer 82 may also comprise aGPIB card 122 coupled to a GPIB bus 112, and/or an MXI device 186coupled to a VXI chassis 116.

As shown, a device 190 may also be connected to the computer. The device190 may include a processor and memory which may execute a real timeoperating system. The device 190 may also or instead comprise aprogrammable hardware element. More generally, the device may compriseheterogeneous hardware components, such as one or more SOCs, at leastone of which may itself include heterogeneous hardware components, asdiscussed herein. The computer system may be configured to deploy aprogram to the device 190 for execution of the program on the device190. In embodiments where the program is a graphical program, thedeployed program may take the form of graphical program instructions ordata structures that directly represents the graphical program.Alternatively, the deployed graphical program may take the form of textcode (e.g., C code) generated from the graphical program. As anotherexample, the deployed graphical program may take the form of compiledcode generated from either the graphical program or from text code thatin turn was generated from the graphical program. Of course, as notedabove, in some embodiments, the program may be a textual program, or acombination of graphical and textual program code.

FIG. 3—Flowchart of a Method for Developing and Deploying a Program withFloating Point Math Functionality to a System with HeterogeneousHardware Components

FIG. 3 illustrates a method for developing and deploying a program,e.g., a graphical and/or textual program, with floating point mathfunctionality to a system that includes heterogeneous hardwarecomponents, e.g., multiple programmable elements, according to oneembodiment. The method shown in FIG. 3 may be used in conjunction withany of the computer systems or devices shown in the Figures, among otherdevices. In various embodiments, some of the method elements shown maybe performed concurrently, in a different order than shown, or may beomitted. Additional method elements may also be performed as desired. Asshown, this method may operate as follows.

First, in 3002, a program may be created on the computer system 82 (oron a different computer system). The program may include floating pointmath functionality (among other functionalities), and may be targetedfor distributed deployment on a system that includes heterogeneoushardware components. For example, in one embodiment, the system mayinclude at least one programmable hardware element, at least one digitalsignal processor (DSP) core, and at least one programmable communicationelement (PCE), although other hardware components are also contemplated(see, e.g., FIGS. 4B, 5A, and 5B, described below). It should be notedthat in addition to the floating point math functionality, the programmay include any other types of functionality as desired, e.g., fixedpoint math functionality, integer math functionality, stringmanipulation, etc.

Exemplary PCEs include, but are not limited to, various data transfermechanisms, internal communication elements, programmable interconnectelements, configurable logic blocks, switch matrices, clock lines,input/output buffers (IOBs), serial data buses, parallel data buses usedto connect heterogeneous hardware components and systems ofheterogeneous hardware, e.g., programmable hardware elements, DSP cores,microprocessors, and GPUs. These PCEs may be internal to a heterogeneoussystem-on-a-chip (HSOC), external to the HSOC, or may be associated witha heterogeneous system implemented on multiple chips. These PCEs may be“hard-core” hardware elements dedicated to a task, or “soft-core”hardware elements created through automatic reconfiguration of resourcesto create a programmable communication element which is configured for aparticular task, operation, communication protocol, or bus.

FIG. 4A illustrates an exemplary heterogeneous system in the form of aheterogeneous SOC, or HSOC. More specifically, the embodiment of FIG. 4Ais a hybrid DSP/FPGA/uP (microprocessor) SOC. As may be seen, the HSOCincludes programmable hardware, e.g., one or more programmable hardwareelements, such as an FPGA fabric, one or more DSP cores, one or moremicroprocessors (uPs) and/or GPUs, and both internal and externalprogrammable communication elements.

FIG. 4B illustrates another heterogeneous system that includes multipleSOCs, including both homogeneous SOCs and heterogeneous SOCs. Morespecifically, the embodiment of FIG. 4B includes three HSOCs, ahomogeneous microprocessor chip, a homogeneous DSP chip, a homogeneousFPGA chip, and a homogeneous GPU (graphical processing unit) chip. Asmay be seen, the various components are communicatively coupled, and maybe configured to execute a program in distributed fashion, as describedbelow. Further exemplary heterogeneous systems are described below withreference to FIGS. 5A and 5B.

As noted above, in some embodiments the program may be a graphicalprogram. The graphical program may be created or assembled by the userarranging on a display a plurality of nodes or icons and theninterconnecting the nodes to create the graphical program. In responseto the user assembling the graphical program, data structures may becreated and stored which represent the graphical program. The nodes maybe interconnected in one or more of a data flow, control flow, orexecution flow format. The graphical program may thus comprise aplurality of interconnected nodes or icons which visually indicates thefunctionality of the program. As noted above, the graphical program maycomprise a block diagram and may also include a user interface portionor front panel portion. Where the graphical program includes a userinterface portion, the user may optionally assemble the user interfaceon the display. As one example, the user may use the LabVIEW™ graphicalprogramming development environment to create the graphical program.

In an alternate embodiment, the graphical program may be created in 3002by the user creating or specifying a prototype, followed by automatic orprogrammatic creation of the graphical program from the prototype. Thisfunctionality is described in U.S. patent application Ser. No.09/587,682 titled “System and Method for Automatically Generating aGraphical Program to Perform an Image Processing Algorithm”, which ishereby incorporated by reference in its entirety as though fully andcompletely set forth herein. The graphical program may be created inother manners, either by the user or programmatically, as desired. Thegraphical program may implement a measurement function that is desiredto be performed by the instrument. In other embodiments, the program maybe a textual program, e.g., in C, C++, JAVA, etc., as desired.

In some embodiments, the program may be generated from any of a varietyof sources, e.g., at least one text-based program, other graphicaldiagrams, e.g., at least one simulation or model, at least one circuitdiagram, at least one network diagram, or at least one statechart, amongothers.

Embodiments of the present invention may further include graphical datatransfer and synchronization mechanisms that enable a plurality oftargets executing floating-point math to simulate complex physicalsystems in which measurements, state-values, inputs, outputs, andparameters may be shared between targets and in graphical programembodiments, and may be represented using graphical floating-pointprogramming constructs such as nodes, functions and wires. In otherwords, the graphical data transfer and synchronization mechanisms may bedeployable to the heterogeneous hardware components, thereby enablingthe heterogeneous hardware components implementing the floating-pointmath functionality to simulate physical systems in which measurements,state-values, inputs, outputs and parameters are shared between theheterogeneous hardware components.

Moreover, embodiments disclosed herein may provide the ability togenerate floating-point graphical programming diagrams suitable forexecution on programmable hardware, e.g., FPGA hardware, from any of aplurality of system modeling environments and languages, including forexample, but not limited to, SPICE, Modelica, Mathscript, VHDL-AMS, andother languages used to capture model descriptions, and may furtherprovide the ability to automatically generate and configure (e.g.,graphical) floating-point code and graphical floating point memoryreferences, event triggers and other (possibly graphical) programmingconstructs necessary for execution of the simulation models and mathfunctions on the programmable hardware using (e.g., graphical) floatingpoint programming, as well as in a desktop emulation context.

For example, in a graphical program implementation, at least some of thewires may represent a floating-point data type, and the plurality ofnodes may include at least one node configured to asynchronously sendone or more trigger events, measurements, parameters, state values andother data to an external FPGA. Thus, in some embodiments, the deployedprogram executing on the programmable hardware may be configured toreceive and respond to programmatic events, such as events related tothe state of floating-point values represented using graphical dataflowprogramming techniques and executed on programmable hardware or in adesktop emulation context.

In 3004, respective portions of the program may be automaticallydetermined for deployment to respective ones of the heterogeneoushardware components, including automatically determining executiontiming for the respective portions. In one embodiment, the respectiveportions may include a first portion targeted for deployment to the atleast one programmable hardware element, and a second portion targetedfor deployment to the at least one DSP core. Note that in otherembodiments, portions of the program may be targeted for deployment toother heterogeneous hardware components, as desired.

In some embodiments, the timing of the communication between PCEs andthe timing of execution of the portions of the programs on theheterogeneous hardware components may be automatically determined basedon the nature of the way in which the program is targeted fordistributed deployment on the system of heterogeneous hardwarecomponents. Alternately, the respective portions of the program fordeployment to the heterogeneous hardware components may be determinedautomatically based on the timing of the communication between PCE andthe timing of execution of the portions of the programs on theheterogeneous hardware components. In one embodiment that combines theautomation of the above tasks, the determination of timing of thecommunication between PCEs, the determination of the timing of theexecution of the portions of the programs on the heterogeneous hardwarecomponents, and the determination of portioning of the program fortargeted distributed deployment to respective heterogeneous hardwarecomponents, may all be automatically determined.

In 3006, first program code implementing communication functionality(including timing functionality, possibly with constraints) between theheterogeneous hardware components, e.g., between the at least oneprogrammable hardware element and the at least one DSP core, may beautomatically generated. The first program code may be targeted fordeployment to or on the at least one programmable communication element.

The at least one PCE may include one or more PCEs for internalcommunications between the at least one programmable hardware elementand the at least one DSP core. In one embodiment, the at least one PCEmay include at least one I/O block for communications between the atleast one programmable hardware element or the at least one DSP core andexternal components or systems.

In 3008, at least one hardware configuration program may beautomatically generated from the program and the first program code. Theautomatic generation of the hardware configuration program may includecompiling the respective portions of the program and the first programcode for deployment to respective ones of the heterogeneous hardwarecomponents. Thus, for example, the first portion of the program may becompiled for deployment to the at least one programmable hardwareelement, thereby generating a first portion of the at least one hardwareconfiguration program, the second portion of the program may be compiledfor deployment to the at least one DSP core, thereby generating a secondportion of the at least one hardware configuration program, and theautomatically generated first program code implementing communicationfunctionality (including timing functionality) may be compiled fordeployment to the at least one communication element, thereby generatinga third portion of the at least one hardware configuration program.

The hardware configuration program may be deployable to the system,where after the deployment, the system may be configured to execute theprogram concurrently, e.g., in parallel, including the floating pointmath functionality. Thus, for example, in one embodiment, deploying theat least one hardware configuration program may include configuring theat least one programmable hardware element with the first portion of theat least one hardware configuration program, configuring the at leastone DSP core with the second portion of the at least one hardwareconfiguration program, and configuring the at least one communicationelement with the third portion of the at least one hardwareconfiguration program. Accordingly, during execution the at least oneprogrammable hardware element performs the functionality of the firstportion of the program, the at least one DSP core performs thefunctionality of the second portion of the program, and the at least onecommunication element implements communication between the at least oneprogrammable hardware element and the at least one DSP core. In otherwords, the at least one hardware configuration program may be used toconfigure the system to implement the functionality of the program(including the floating point math functionality), after which thesystem may be operable to perform the respective functionality via theheterogeneous hardware components concurrently, e.g., in parallel.

In some embodiments, the hardware configuration program may be directlyconverted into an FPGA program file describing a plurality of computingelements, including, for example, but not limited to, one or more of:fixed point FPGA fabric, floating point FPGA fabric, DSP cores, soft orhardcore microprocessors, graphics processing units (GPUs), or otherheterogeneous computing elements which are integrated in oneheterogeneous or homogenous chipset or multiple heterogeneous orhomogenous chipsets.

FIGS. 5A and 5B are high level block diagrams of further exemplaryheterogeneous systems that may be configured according to embodiments ofthe present invention. More specifically, the systems of FIGS. 5A and 5Bare exemplary interface cards configured with programmable hardwareaccording to various embodiments of the present invention. It is notedthat the embodiments shown in FIGS. 5A and 5B are exemplary only, andthat an interface card or device configured with programmable hardwareaccording to the present invention may have any of various architecturesor forms, as desired. The interface cards illustrated in FIGS. 5A and 5Bmay be embodiments of the DAQ interface card 114 shown in either of FIG.1A or 1B. However, as noted above, the programmable hardware may beincluded on any of the various devices shown in FIG. 1A or 1B, or onother devices, as desired.

As may be seen, in the embodiment of FIG. 5A, the interface cardincludes an HSOC 200, such as the HSOC of FIG. 4A. The card alsoincludes an I/O connector 202 which is coupled for receiving signals.The I/O connector 202 may present analog and/or digital connections forreceiving/providing analog or digital signals, respectively. The I/Oconnector 202 may further be adapted for coupling to SCXI conditioninglogic 124 and 126 (see FIGS. 1A and 1B), or may be adapted to be coupleddirectly to a unit under test 130 or process 160.

As shown, the interface card may also include data acquisition (DAQ)logic 204, which may include analog to digital (A/D) converters, digitalto analog (D/A) converters, timer counters (TC) and signal conditioning(SC) logic as indicated. The DAQ logic 204 may provide the dataacquisition functionality of the DAQ card.

As shown, the interface card may further include bus interface logic 216and a control/data bus 218. In one embodiment, the interface card is aPCI bus-compliant interface card adapted for coupling to the PCI bus ofthe host computer 102, or adapted for coupling to a PXI (PCI eXtensionsfor Instrumentation) bus. The bus interface logic 216 and thecontrol/data bus 218 thus present a PCI or PXI interface.

The interface card 114 also includes local bus interface logic 208. Inone embodiment, the local bus interface logic 208 presents a RTSI (RealTime System Integration) bus for routing timing and trigger signalsbetween the interface card 114 and one or more other devices or cards.

The HSOC 200 is shown coupled to the DAQ logic 204 and also coupled tothe local bus interface 208, as well as control/data bus 218. Thus aprogram can be created on the computer 82, or on another computer in anetworked system, and at least a portion of the program can be convertedinto a hardware implementation form for execution on or by the HSOC 200.The portion of the program converted into a hardware implementation formis preferably a portion which requires fast and/or real-time execution.

In the embodiment of FIG. 5A, the interface card further includes adedicated on-board microprocessor (μP) and/or GPU 212 and memory 214.This enables a portion of the program to be compiled into machinelanguage for storage in the memory 214 and execution by themicroprocessor 212. This may be in addition to a portion of the programbeing converted into a hardware implementation form for the HSOC 200.Thus, in one embodiment, after a program has been created, a portion ofthe program may be compiled for execution on the embedded microprocessor212 and may execute locally on the interface card via the microprocessor212 and memory 214, and a other portions of the program may betranslated or converted into a hardware executable format and downloadedto the HSOC 200 for hardware implementation, as described in more detailherein.

Turning now to FIG. 5B, in this exemplary embodiment, the HSOC 200,microprocessor 212, and memory 214 are not included on the interfacecard; rather, a DSP core 207 and a programmable hardware element, e.g.,an FPGA, (206) with at least one programmable communication element(PCE) are included, and thus only the portions of the program which areconverted into hardware implementation form are downloaded to the card,specifically, to the programmable hardware element (e.g., FPGA) 206, theprogrammable communication element(s), and the DSP core 207. Thus in theembodiment of FIG. 5B, any supervisory control portion of the programwhich is necessary or desired to execute in machine language on aprogrammable CPU may be executed by the host CPU in the computer system102 or some other processor communicatively coupled to the card, notexecuted locally by a CPU on the interface card.

Further Exemplary Embodiments

The following presents various further exemplary embodiments of thepresent invention, although these embodiments are not intended to limitthe invention or its application to any particular implementation oruse.

In one embodiment, the system may include a host computer and ameasurement device having a programmable hardware element. Theprogrammable hardware element may be configured to perform a loop toacquire floating point data from a physical system measurement or ameasurement from a system simulated in the programmable hardware elementusing (possibly graphical) floating-point programming constructs, orboth. The host computer may be configured to perform another loop toread the simulated and/or physical measurement data from theprogrammable hardware element and use the measurement data in asimulation, measurement and control algorithm. The host computer ormeasurement device may be further configured to perform asynchronization algorithm to keep the simulation and physicalmeasurement data acquisition loop performed by the programmable hardwareelement synchronized with a measurement, simulation, and control loopperformed by the host computer. In some embodiments, the system mayinclude a plurality of FPGA devices and a plurality of host computers.

In another embodiment, the system may be configured (e.g., by theprogram) to implement communication of floating point data between afirst programmable hardware element or computer and a secondprogrammable hardware element or computer over a direct digitalconnection.

Some embodiments may be implemented at the chip level. For example, inone embodiment, the system may include a heterogeneous system on a chip(see, e.g., FIG. 5A). In another embodiment, the system may include aheterogeneous system implemented on multiple chips (see, e.g., FIG. 5B).The at least one PCE may be configurable for intra-chip communicationsor inter-chip communications.

In one embodiment, the method may include automatically deploying thehardware configuration program to the system.

In some embodiments, the program may include multiple models ofcomputation, e.g., different portions of the program may operate inaccordance with different models of computation, e.g., data flow,control flow, procedural, declarative, and so forth, as desired. In oneembodiment, the program may include code (e.g., graphical program codeor structures) directed to multiple different physical domains, e.g.,code simulating or related to one or more of electrical power,electronics, hydrodynamics, chemistry, physics, thermodynamics, amongothers, as desired.

It should be noted that any of the techniques disclosed herein ordescribed in any of the references incorporated by reference above maybe used in any combinations desired.

FIG. 6—Conversion of Graphical Code into a Heterogeneous HardwareImplementation

Referring now to FIG. 6, a flowchart diagram is shown illustrating oneembodiment of the present invention where the program is a graphicalprogram, although it should be noted that the graphical programimplementation is exemplary only, and that the method elements of FIG. 6are also applicable to text based (i.e., textual) programs and/orcombinations of textual and graphical programs. Below is described acomputer-implemented method for generating heterogeneous hardwareimplementations of graphical programs or graphical code with floatingpoint math functionality; however, it should be noted that thetechniques disclosed are also applicable to textual programs, thegraphical embodiments being exemplary only. In various embodiments, someof the method elements shown may be performed concurrently, in adifferent order than shown, or may be omitted. Additional methodelements may also be performed as desired. As shown, the method mayoperate as follows.

The method below presumes that a graphical programming developmentsystem is stored in the memory of the computer system for creation ofgraphical programs with floating point math functionality. However, itshould be noted that other functionality may also be included in thegraphical program, e.g., fixed point math functionality, etc. In oneembodiment, the graphical programming system is the LabVIEW graphicalprogramming system available from National Instruments. In this system,the user may create the graphical program in a graphical program editor,e.g., via a graphical program panel, referred to as a block diagramwindow, and also creates a user interface in a graphical front panel.The graphical program is sometimes referred to as a virtual instrument(VI). The graphical program or VI will typically have a hierarchy ofsub-graphical programs or sub-VIs.

As shown, in step 302 the user first receives (or creates) a graphical(or textual) program, also sometimes referred to as a block diagram. Inone embodiment, the graphical program comprises a graphical data flowdiagram which specifies functionality of the program to be performed.This graphical data flow diagram is preferably directly compilable intomachine language code for execution on a computer system. In someexemplary embodiments, the graphical program may include floating pointfunctionality and program code implementing communication functionality,including timing functionality.

In step 304 the method operates to export at least a portion of thegraphical program (with floating point math functionality) to aheterogeneous hardware description. Thus, after the user has created agraphical program in step 302, the user selects an option to export aportion of the graphical program to a heterogeneous hardwaredescription. The hardware description may be a VHDL description, e.g., aVHDL source file, or alternatively may be a high level net listdescription. The heterogeneous hardware description comprises a highlevel hardware description of floating point function blocks, logic,inputs, and outputs which perform the operation indicated by thegraphical program. The operation of exporting at least a portion of agraphical program to a hardware description is discussed in more detailwith the flowchart of FIG. 9.

As noted above, in some embodiments, the determination of respectiveportions of the graphical (or textual) program targeted to respectivehardware components of the system may be automatic. In other words, themethod may automatically partition the graphical program into respectiveportions for deployment to the respective hardware components.

Alternatively, in one embodiment, during creation of the graphicalprogram in step 302 the user specifies portions, e.g., sub VIs, whichare to be exported to the heterogeneous hardware description format forconversion into a hardware implementation. In another embodiment, whenthe user selects the option to export a portion of the graphical programto the heterogeneous hardware description format, the user selects whichmodules or sub-VIs at that time that are to be exported to theheterogeneous hardware description.

In step 306 the method may operate to convert the heterogeneous hardwaredescription into an FPGA-specific net list. The net list describes thecomponents required to be present in the hardware as well as theirinterconnections. Conversion of the heterogeneous hardware descriptioninto the FPGA-specific net list may be performed by any of various typesof commercially available synthesis tools, such as those available fromXilinx, Altera, etc., among others.

In one embodiment, the converting step 306 may utilize one or morepre-compiled function blocks from a library of pre-compiled functionblocks 308. Thus, for certain function blocks which are difficult tocompile, or less efficient to compile, from a hardware description intoa net list format, the hardware description created in step 304 includesa reference to a pre-compiled function block from the library 308. Therespective pre-compiled function blocks are simply inserted into the netlist in place of these references in step 306. This embodiment of theinvention thus includes the library 308 of pre-compiled function blockswhich are used in creating the net list. This embodiment also includeshardware target specific information 310 which is used by step 306 inconverting the hardware description into a net list which is specific toa certain type or class of FPGA.

In step 312 the method operates to compile the net list into at leastone heterogeneous hardware configuration program, e.g., an FPGA programfile, also referred to as a software bit stream. The at least oneheterogeneous hardware configuration program is a file that can bereadily downloaded to program the heterogeneous hardware components,e.g., an FPGA and other heterogeneous or homogeneous programmablehardware devices, e.g., computing devices, such as a heterogeneoussystem-on-chip (SOC) devices containing a plurality of computingelements (e.g., heterogeneous programmable hardware components).

After the net list has been compiled into at least one heterogeneoushardware configuration program (e.g., an FPGA program file) in step 312,then in step 314 the method may transfer the at least one heterogeneoushardware configuration program (e.g., the FPGA program file) to theprogrammable hardware, e.g., the FPGA and other programmable hardwarecomponents, to produce programmed hardware equivalent to the graphicalprogram. Thus, upon completion of step 314, the portion of a graphicalprogram referenced in step 304 is comprised as a hardware implementationin the heterogeneous system, e.g., in an FPGA and/or other programmablehardware element, and/or other programmable hardware components of thesystem.

It is noted that various of the above steps can be combined and/or canbe made to appear invisible to the user. For example, steps 306 and 312can be combined into a single step, as can steps 304 and 306. In oneembodiment, after the user creates the graphical program in step 302,the user simply selects a hardware export option, and indicates theheterogeneous hardware targets or destinations, causing steps 304-314 tobe automatically performed.

FIG. 7—Conversion of a Graphical Program into Machine Language andHardware Implementations

FIG. 7 is a more detailed flowchart diagram illustrating one embodimentof the invention, including compiling a first portion of the graphicalprogram into machine language and converting a second portion of thegraphical program into a hardware implementation. As with the abovemethods, while the embodiments described may be in terms of a graphicalprogram, it should be noted that the graphical program implementation isexemplary only, and that the techniques of FIG. 7 are also applicable totext based (i.e., textual) programs and/or combinations of textual andgraphical programs.

As shown in FIG. 7, after the user has created (and/or received) agraphical program in step 302, the user can optionally select a firstportion to be compiled into machine code for CPU execution as isnormally done. In one embodiment, the user preferably selects asupervisory control and display portion of the graphical program to becompiled into machine code for a CPU execution. The first portioncomprising supervisory control and display portions is compiled forexecution on a CPU, such as the host CPU in the computer 102 or the CPU212 comprised on the interface card 114. This enables the supervisorycontrol and display portions to execute on the host CPU, which isoptimal for these elements of the program.

The user selects a second portion for conversion to hardwareimplementation, which is performed as described above in steps 304-314of FIG. 6. The portion of the graphical program which is desired forhardware implementation preferably comprises modules or VIs whichrequire a fast or deterministic implementation and/or are desired toexecute in a stand-alone hardware unit. In general, portions of thegraphical program which are desired to have a faster or moredeterministic execution are converted into the hardware implementation.In one embodiment, the entire graphical program is selected forconversion to a hardware implementation, and thus step 322 is notperformed.

FIG. 8—Creation of a Graphical Program

FIG. 8 is a more detailed flowchart diagram of step 302 of FIGS. 6 and7, illustrating creation of a graphical program according to oneembodiment of the invention. As shown, in step 342 the user arranges onthe screen a graphical program or block diagram. This includes the userplacing and connecting, e.g., wiring, various icons or nodes on thedisplay screen in order to configure a graphical program. Morespecifically, the user selects various function icons or other icons andplaces or drops the icons in a block diagram panel, and then connects or“wires up” the icons to assemble the graphical program. The user alsopreferably assembles a user interface, referred to as a front panel,comprising controls and indicators which indicate or representinput/output to/from the graphical program. For more information oncreating a graphical program in the LabVIEW graphical programmingsystem, please refer to the LabVIEW system available from NationalInstruments as well as the above patent applications incorporated byreference.

In response to the user arranging on the screen a graphical program, themethod operates to develop and store a tree of data structures whichrepresent the graphical program. Thus, as the user places and arrangeson the screen function nodes, structure nodes, input/output terminals,and connections or wires, etc., the graphical programming systemoperates to develop and store a tree of data structures which representthe graphical program. More specifically, as the user assembles eachindividual node and wire, the graphical programming system operates todevelop and store a corresponding data structure in the tree of datastructures which represents the individual portion of the graphicalprogram that was assembled. Thus, steps 342 and 344 are an iterativeprocess which are repetitively performed as the user creates thegraphical program.

FIG. 9—Exporting a Portion of the Graphical Program to a HardwareDescription

FIG. 9 is a flowchart diagram of step 304 of FIGS. 6 and 7, illustratingoperation when the method exports a portion of the graphical programinto a hardware description. As with the above methods, while theembodiments described may be in terms of a graphical program, it shouldbe noted that the graphical program implementation is exemplary only,and that the techniques of FIG. 9 are also applicable to text based(i.e., textual) programs and/or combinations of textual and graphicalprograms.

The tree of data structures created and stored in step 344 preferablycomprises a hierarchical tree of data structures based on the hierarchyand connectivity of the graphical program. As shown, in step 362 themethod traverses the tree of data structures and in step 364 the methodoperates to translate each data structure into a hardware descriptionformat. In one embodiment, the method first flattens the tree of datastructures prior to traversing the tree in step 362.

In the present embodiment, a number of different function icons and/orprimitives can be placed in a diagram or graphical program forconversion into a hardware implementation. These primitives include, butare not limited to, function nodes, constants, global variables, controland indicator terminals, structure nodes, and sub-VIs, etc. Functionicons or primitives can be any data type, but in the current embodimentare limited to Integer or Boolean data types. Also, global variables arepreferably comprised on a single global panel for convenience. If a VIappears multiple times, then the VI is preferably re-entrant and mayhave state information. If a VI is not re-entrant, then preferablymultiple copies of the VI are created in hardware if the VI has no stateinformation, otherwise it would be an error.

In one embodiment, each node which is converted to a hardwaredescription includes an Enable input, a Clear_Enable signal input, amaster clock signal input and an Enable_Out or Done signal. The Enableinput guarantees that the node executes at the proper time, i.e., whenall of its inputs have been received. The Clear_Enable signal input isused to reset the node if state information remembers that the node wasdone. The Enable_Out or Done signal is generated when the node completesand is used to enable operation of subsequent nodes which receive anoutput from the node. Each node which is converted to a hardwaredescription also includes the data paths depicted in the graphicalprogram.

For While loop structures, Iteration structures, Sequence structures,and Case Structures, the respective structure is essentially abstractedto a control circuit or control block. The control block includes adiagram enable out for each sub-diagram and a diagram done input foreach sub-diagram.

In addition to the above signals, e.g., the Enable input, theClear_Enable signal input, the master clock signal input, and theEnable_Out or Done signal, all global variables have numerous additionalsignals, including CPU interface signals which are specific to the typeof CPU and bus, but typically include data lines, address lines, clock,reset and device select signals. All VIs and sub-VIs also include CPUinterface signals if they contain a global variable.

In one embodiment, when an icon is defined for a VI used solely torepresent a hardware resource connected to the FPGA, e.g., an A/Dconverter, with a number of inputs and outputs, a string control ispreferably placed on the front panel labeled VHDL. In this case, thedefault text of the string control is placed in the text file createdfor the VHDL of the VI. Thus, in one embodiment, a library of VIs areprovided each representing a physical component or resource available inor to the FPGA. As these VHDL files representing these VIs are used, themethod of the present invention monitors their usage to ensure that eachhardware resource is used only once in the hierarchy of VIs beingexported to the FPGA. When the VHDL file is written, the contents of thestring control are used to define the access method of that hardwareresource.

The following is pseudo-code which describes the operations performed inthe flowchart of FIG. 9:

GenCircuit (vi)

-   -   send GenCircuit to top level diagram of vi

Diagram:GenCircuit(d)

-   -   send GenCircuit to each constant in d    -   send GenCircuit to each node in d    -   send GenCircuit to each signal in d

Signal: GenCircuit(s)

-   -   declare type of signal s

BasicNode:GenCircuit(n)

-   -   declare type of component needed for n    -   declare AND-gate for enabling n (if needed)    -   list connections for all node inputs    -   list connections for all inputs to enabling AND-gate (if needed)

Constant:GenCircuit(c)

-   -   declare type and value of constant c

WhileLoopNode:GenCircuit(n)

-   -   declare while loop controller component    -   declare AND-gate for enabling n (if needed)    -   list connections for all node inputs    -   list connections for all inputs to enabling AND-gate (if needed)    -   declare type of each shift register component    -   list connections for all inputs to all shift registers    -   declare type of each tunnel component    -   list connections for all inputs to all tunnels

CaseSelectNode:GenCircuit (n)

-   -   declare case select controller component    -   declare AND-gate for enabling n (if needed)    -   list connections for all node inputs    -   list connections for all inputs to enabling AND-gate (if needed)    -   declare type of each tunnel component    -   list connections for all inputs to all tunnels

SequenceNode:GenCircuit (n)

-   -   declare sequence controller component    -   declare AND-gate for enabling n (if needed)    -   list connections for all node inputs    -   list connections for all inputs to enabling AND-gate (if needed)    -   declare type of each tunnel component    -   list connections for all inputs to all tunnels

SubVINode:GenCircuit (n)

-   -   send GenCircuit to the subVI of n    -   associate inputs & outputs of subVI with those of n    -   declare AND-gate for enabling n (if needed)    -   list connections for all node inputs    -   list connections for all inputs to enabling AND-gate (if needed)

Referring to the above pseudo code listing, the method starts at the VIlevel (the top level) and begins generation of VHDL by sending a messageto the top level diagram. The method in turn effectively provides amessage from the diagram to each constant, each node, and each signal inthe diagram.

For signals, the method then declares the signal type.

For basic nodes, the method declares a type of the component needed, andalso declare an AND-gate with the proper number of inputs needed inorder to enable itself. In other words, basic nodes declare an AND-gatewith a number of inputs corresponding to the number of inputs receivedby the node. Here, optimization is preferably performed to minimize thenumber of inputs actually needed. For example, if a node has threeinputs, the node does not necessarily need a three input AND-gate if twoof those inputs are coming from a single node. As another example, ifone input comes from node A and another input comes from node B, butnode A also feeds node B, then the input from node A is not needed inthe AND gate. Thus various types of optimization are performed to reducethe number of inputs to each AND gate. For the basic node, the methodalso lists the connections for all of its inputs as well as theconnections for all inputs to the enabling AND-gate.

For a constant, the method simply declares the type and the value of theconstant.

For a While loop, the method declares a While loop controller component.The method also declares an AND-gate, lists AND-gate inputs, and listsnode inputs in a similar manner to the basic node described above. Themethod then declares the type for each shift register and includes acomponent for the shift register, and lists all the connections for theshift register inputs. If any tunnels are present on the While loop, themethod declares the type of each tunnel component and list theconnections for the inputs to the tunnels. For most tunnels, the methodsimply equivalences the signals for the inside and outside, without anyeffect.

The method proceeds in a similar manner for Case and Sequencestructures. For Case and Sequence structures, the method declares a caseselect controller component or a sequence controller component,respectively. For both Case and Sequence structures, the method alsodeclares an AND-gate, lists AND-gate inputs, and lists node inputs in asimilar manner to the basic node described above. The method thendeclares the component needed for any tunnels and list the connectionsfor the inputs to the tunnels.

For a sub-VI, the method sends a message to the sub-VI and associatesinputs and outputs of the sub-VI with those of n. The method thendeclares an AND-gate, lists AND-gate inputs, and lists node inputs in asimilar manner to the basic node described above.

FIG. 10—Exporting an Input Terminal into a Hardware Description

FIG. 10 is a flowchart diagram illustrating operation when the methodexports an input terminal into the hardware description format. Asshown, in step 402 the method determines if the data provided to theinput terminal is input from a portion of the graphical program whichwill be executing on the CPU, i.e., the portion of the graphical programwhich is to be compiled into machine language for execution on the CPU,or whether the data is input from another portion of the graphicalprogram that is also being transformed into a hardware implementation.As with the above methods, while the embodiments described may be interms of a graphical program (e.g., graphical program terminals), itshould be noted that the graphical program implementation is exemplaryonly, and that the techniques of FIG. 10 are also applicable to textbased (i.e., textual) programs and/or combinations of textual andgraphical programs. For example, instead of “terminals”, a text basedprogram implementation may be directed to input/output argument lists oftext based functions or programs.

As shown, if the data input to the input terminal is determined in step402 to be input from a portion of the graphical program being compiledfor execution on the CPU, in step 406 the method creates a hardwaredescription of a write register with a data input and data and controloutputs. The write register is operable to receive data transferred bythe host computer, i.e., generated by the compiled portion executing onthe CPU. In step 408 the data output of the write register is connectedfor providing data output to other elements in the graphical programportion. In step 408 the control output of the write register isconnected to other elements in the graphical program portion forcontrolling sequencing of execution, in order to enable the hardwaredescription to have the same or similar execution order as the graphicalprogram.

If the data is determined to not be input from a portion being compiledfor execution on the CPU step in 402, i.e., the data is from anothernode in the portion being converted into a hardware implementation, thenin step 404 the method ties the data output from the prior node intothis portion of the hardware description, e.g., ties the data outputfrom the prior node into the input of dependent sub-modules as well ascontrol path logic to maintain the semantics of the original graphicalprogram.

FIG. 11—Exporting a Function Node into a Hardware Description

FIG. 11 is a flowchart diagram illustrating operation where the methodexports a function node into the hardware description format. In oneembodiment, the term “function node” refers to any various types oficons or items which represent a function being performed. Thus, afunction node icon represents a function being performed in thegraphical program. Examples of function nodes include arithmeticfunction nodes, e.g., add, subtract, multiply, and divide nodes,trigonometric and logarithmic function nodes, comparison function nodes,conversion function nodes, string function nodes, array and clusterfunction nodes, file I/O function nodes, etc. As with the above methods,while the embodiments described may be in terms of a graphical program,it should be noted that the graphical program implementation isexemplary only, and that the techniques of FIG. 11 are also applicableto text based (i.e., textual) programs and/or combinations of textualand graphical programs.

As shown in FIG. 11, in step 422 the method determines the inputs andoutputs of the function node. In step 424 the method creates a hardwaredescription of the function block corresponding to the function nodewith the proper number of inputs and outputs as determined in step 422.Alternatively, in step 424 the method includes a reference in thehardware description to a pre-compiled function block from the library308. In this case, the method also includes the determined number ofinputs and outputs of the function node.

In step 426 the method traverses the input dependencies of the node todetermine which other nodes provide outputs that are provided as inputsto the function node being converted. In step 428 the method creates ahardware description of an N input AND gate, wherein N is the number ofinputs to the node, with each of the N inputs connected to controloutputs of nodes which provide inputs to the function node. The outputof the AND gate is connected to a control input of the function blockcorresponding to the function node.

In the data flow diagramming model of one embodiment, a function nodecan only execute when all of its inputs have been received. The AND gatecreated in step 428 emulates this function by receiving all controloutputs of nodes which provide inputs to the function node. Thus the ANDgate operates to effectively receive all of the dependent inputs thatare connected to the function node and AND them together to provide anoutput control signal which is determinative of whether the functionnode has received all of its inputs. The output of the AND gate isconnected to the control input of the function block and operates tocontrol execution of the function block. Thus, the function block doesnot execute until the AND gate output provided to the control input ofthe function block provides a logic signal indicating that all dependentinputs which are input to the function node have been received.

FIG. 12—Exporting an Output Terminal into a Hardware Description

FIG. 12 is a flowchart diagram illustrating operation where the methodexports an output terminal into the hardware description. As shown, instep 440 the method determines if the data provided from the outputterminal is output to a portion of the graphical program which will beexecuting on the CPU, i.e., the portion of the graphical program whichis to be compiled into machine language for execution on the CPU, orwhether the data is output to another portion of the graphical programthat is also being transformed into a hardware implementation. As withthe above methods, while the embodiments described may be in terms of agraphical program, it should be noted that the graphical programimplementation is exemplary only, and that the techniques of FIG. 12 arealso applicable to text based (i.e., textual) programs and/orcombinations of textual and graphical programs. For example, instead of“terminals”, a text based program implementation may be directed toinput/output argument lists of text based functions or programs.

As shown, if the data output from the output terminal is determined instep 440 to be output to a portion of the graphical program beingcompiled for execution on the CPU, then in step 442 the method creates ahardware description of a read register with a data input and data andcontrol outputs. The read register is operable to receive data generatedby logic representing a prior node in the graphical program.

In step 444 the method connects the data output of a prior node to thedata input of the read register. In step 444 the control input of theread register is also connected to control sequencing of execution,i.e., to guarantee that the read register receives data at the propertime. This enables the hardware description to have the same or similarexecution order as the graphical program.

If the data is determined to not be output to a portion being compiledfor execution on the CPU step in 440, i.e., the data is to another nodein the portion being converted into a hardware implementation, then instep 446 the method ties the data output from the output terminal into asubsequent node in this portion of the hardware description, e.g., tiesthe data output from the output terminal into the input of subsequentsub-modules as well as control path logic to maintain the semantics ofthe original graphical program.

FIG. 13—Exporting a Structure Node into a Hardware Description

FIG. 13 is a flowchart diagram illustrating operation where the methodexports a structure node into the hardware description. In oneembodiment, the term “structure node” refers to a node which representscontrol flow of data, including iteration, looping, sequencing, andconditional branching. Examples of structure nodes include For/Nextloops, While/Do loops, Case or Conditional structures, and Sequencestructures. For more information on structure nodes, please see theabove LabVIEW patents referenced above. As with the above methods, whilethe embodiments described may be in terms of a graphical program, itshould be noted that the graphical program implementation is exemplaryonly, and that the techniques of FIG. 13 are also applicable to textbased (i.e., textual) programs and/or combinations of textual andgraphical programs. For example, instead of a “structure node”, a textbased program implementation may be directed to corresponding text basedsoftware functions or structures.

The flowchart of FIG. 13 illustrates exporting a loop structure nodeinto a hardware description. As shown, in step 462 the method examinesthe structure node parameters, e.g., the iteration number, loopcondition, period, phase delay, etc. As discussed above, the graphicalprogramming system preferably allows the user to insert certainparameters into a structure node to facilitate exporting the structurenode into a hardware description. Iteration and looping structure nodeshave previously included an iteration number and loop condition,respectively. According to one embodiment of the invention, thesestructure nodes further include period and phase delay parameters, whichare inserted into or assigned to the structure node. These provideinformation on the period of execution and the phase delay of thestructure node. As discussed below, the period and phase delayparameters, as well as the iteration number or loop condition, are usedto facilitate exporting the structure node into a hardware description.

In step 464, the method inserts the structure node parameters into thehardware description. In step 466 the method inserts a reference to apre-compiled function block corresponding to the type of structure node.In the case of a looping structure node, the method inserts a referenceto a pre-compiled function block which implements the looping functionindicated by the structure node. The method also connects controls tothe diagram enclosed by the structure node.

FIG. 14—Converting a Node into a Hardware Description

FIG. 14 is a flowchart diagram of a portion of step 306 of FIGS. 6 and7, illustrating operation where the method converts the hardwaredescription for a node into a net list. FIG. 14 illustrates operation ofconverting a hardware description of a node, wherein the hardwaredescription comprises a reference to a function block and may includenode parameters. It is noted that where the hardware description of anode comprises a description of the actual registers, gates, etc. whichperform the operation of the node, then conversion of this hardwaredescription to a net list is readily performed using any of varioustypes of synthesis tools. As with the above methods, while theembodiment described may be in terms of a graphical program, it shouldbe noted that the graphical program implementation is exemplary only,and that the techniques of FIG. 14 are also applicable to text based(i.e., textual) programs and/or combinations of textual and graphicalprograms. For example, instead of “nodes”, a text based programimplementation may be directed to corresponding text based functions orprograms.

As shown, in step 502 the method examines the function block referenceand any node parameters present in the hardware description. In step504, the method selects the referenced pre-compiled function block fromthe library 308, which essentially comprises a net list describing thefunction block. In step 506 the method then configures the pre-compiledfunction block net list with any parameters determined in step 502. Instep 508 the method then inserts the configured pre-compiled functionblock into the net list which is being assembled.

FIG. 15—Converting a Structure Node into a Hardware Description

FIG. 15 is a flowchart diagram illustrating operation of the flowchartof FIG. 14, where the method converts the hardware description for astructure node into a net list. FIG. 15 illustrates operation ofconverting a hardware description of a structure node, wherein thehardware description comprises a reference to a structure node functionblock and includes structure node parameters. As with the above methods,while the embodiments described may be in terms of a graphical program,it should be noted that the graphical program implementation isexemplary only, and that the techniques of FIG. 13 are also applicableto text based (i.e., textual) programs and/or combinations of textualand graphical programs. For example, instead of a “structure node”, atext based program implementation may be directed to corresponding textbased software functions or structures.

As shown, in step 502A the method examines the function block referenceand the structure node parameters present in the hardware description.The structure node parameters may include parameters such as theiteration number, loop condition, period, phase delay, etc. In step 504Athe method selects the referenced pre-compiled function block from thelibrary 308, which essentially is a net list describing the structurenode function block. In step 506A the method then configures thepre-compiled function block net list with the structure node parametersdetermined in step 502A. This involves setting the period and phasedelay of execution of the structure node as well as any other parameterssuch as iteration number, loop condition, etc. In step 508A the methodthen inserts the configured pre-compiled function block into the netlist which is being assembled.

FIG. 16—Function Block for a Structure Node

FIG. 16 is a block diagram illustrating an exemplary While loop functionblock 582. As shown, the While loop function block includes enabling,period, and phase inputs, as well as a loop control input. The Whileloop function block provides an index output which is provided to afloating point multiply and add node 584. The adder operates toincrement each time the index signals provided to monitor the number oftimes the While loop is executed. The While loop further outputs Clearand Enable Out signals to control the program within the While loop andfurther receives a Loop Done signal input which is used to indicatewhether the loop has completed. In a textual program implementation, theabove features apply to a corresponding text-based software construct,e.g., a textual While loop.

FIG. 17—Operation of Structure Node Function Block

FIG. 17 is a state diagram illustrating operation of the while loopfunction block shown in FIG. 16. As shown, a diagram start operationprecedes to state A. When Phase Done is true indicating that the phasehas completed, then the state machine advances to state B. The statemachine remains in state B until the Loop Enable signal is true,indicating that the loop has been enabled to begin execution. When theLoop Enable signal is asserted, the state machine advances from state Bto state C. In state C the Clear Output signal is asserted, clearing theloop output prior to execution of the loop.

The state machine then advances from state C to state D. In state D thecomputation is performed, and the Set Enable out signal is asserted. Ifthe period is done and the loop is not yet completed, signified by theequation:

Period Done and/Loop Done

then the state machine proceeds to an error state and operationcompletes. Thus, the period set for execution for the loop was notsufficiently long to allow the loop to complete. In other words, theloop took more time to complete than the period set for execution of theloop.

The state machine advances from state D to state E when the Loop Donesignal is asserted prior to the Period Done signal being asserted,indicating that the loop has completed prior to the period allotted forthe loop execution being over.

The state machine then advances from state E to a wait state, as shown.If the period is done and the loop is not re-enabled, signified by thecondition:

Period Done &/Loop Enabled

then the state machine advances from the Wait to the Done state. If theperiod has completed and the loop is still enabled, indicating thatanother execution of the loop is necessary, then the state machineadvances from the Wait state back to the C state. Thus, the statemachine advances through state C, D, E, and Wait to perform loopingoperations. The above features are also applicable to textual programbased equivalents, e.g., corresponding text based software constructs orfunctions.

FIG. 18—Simple Graphical Program Example

FIG. 18 illustrates a simple example of a graphical program. In FIG. 18the graphical program includes three input terminals, specifically, asingle precision input, a first double precision input, and a seconddouble precision input, which could be a single precision input ifdesired, and one double precision output terminal. As may be seen, thegraphical program simply comprises a first 2-input Add function nodewhich receives input from the first two inputs terminals, and a second2-input Add function node which receives the output from the first Addfunction node and receives an output from the third input terminal. Thesecond 2-input Add function node provides an output to the doubleprecision output terminal as shown.

FIG. 19—Hardware Result

FIG. 19 is a conceptual diagram of the resulting hardware after thegraphical program example of FIG. 18 is converted into a hardwaredescription. As shown, the hardware diagram includes three writeregisters 522-526 corresponding to each of the three input terminals.The data outputs of the first two write registers 522 and 524 areprovided as inputs to a first two-input floating point multiply and addnode 532, which corresponds to the first floating point multiply and addnode in the block diagram of FIG. 18. The hardware description alsoinvolves creating an AND gate 534 which receives control outputs fromeach of the first two write registers 522 and 524 and provides a singleoutput to the control input of the floating point multiply and add node532. The purpose of the AND gate 534 is to prevent the floating pointmultiply and add node 532 from executing until both inputs have beenreceived.

The Adder 532 provides a data output to a second two-input floatingpoint multiply and add node 542, which corresponds to the secondfloating point multiply and add node in the block diagram of FIG. 18.The first floating point multiply and add node 532 also generates anenable out signal which is provided to an input of a second AND gate536. The other input of the AND gate 536 receives an output from thethird write register 526, corresponding to the third input terminal. TheAND gate 536 provides an output to a control input of the secondfloating point multiply and add node 542. Thus, the AND gate 536operates to ensure that the second floating point multiply and add node542 does not execute until all inputs have been received by the floatingpoint multiply and add node 542. The second floating point multiply andadd node 542 provides a data output to a read register 546 associatedwith the output terminal. The second floating point multiply and addnode 542 also provides an enable out signal to the read register 546,which notifies the read register 546 when valid data has been provided.

Thus, as shown, to create a hardware description for each of the inputterminals, the flowchart diagram of FIG. 9 is executed, which operatesto create a hardware description of a write register 522, 524, and 526,each with data and control outputs. For each floating point multiply andadd function node, the flowchart diagram of FIG. 10 is executed, whichoperates to create a hardware description of an adder 532 or 542, andfurther creates an associated N input AND gate 534 or 536, with inputsconnected to the dependent inputs of the adder function node to ensureexecution at the proper time. Finally, the flowchart diagram of FIG. 11is executed for the output terminal of the graphical program, whichoperates to generate a hardware description of a read register with dataand control inputs. As noted above, textual program equivalents are alsocontemplated. In other words, the techniques disclosed above aredirectly applicable to corresponding textual programs targeted fordeployment on programmable hardware.

FIGS. 20-22: Example of Converting a Graphical Program into a HardwareImplementation

FIGS. 20-22 comprise a more detailed example illustrating operation ofthe present invention, according to one embodiment. As with the abovemethods, while the embodiments described may be in terms of a graphicalprogram, it should be noted that the graphical program implementation isexemplary only, and that the techniques of FIGS. 20-22 are alsoapplicable to text based (i.e., textual) programs and/or combinations oftextual and graphical programs. For example, instead of a graphicalprogram with a graphical While loop, a text based program implementing aWhile loop with contained textual functions may be converted to ahardware implementation.

FIG. 20 illustrates an example graphical program (a LabVIEW diagram)which is converted into a hardware implementation, e.g., an FPGAimplementation, using an embodiment of the present invention. As shown,the graphical program comprises a plurality of interconnected nodescomprised in a While loop. As shown, the While loop includes shiftregister icons, represented by the down and up arrows at the left andright edges, respectively, of the While loop. A 0 constant positionedoutside of the While loop is connected to the down arrow of the shiftregister at the left edge of the While loop.

As FIG. 20 shows, inside the While loop, a floating point set pointelement and a floating point a/d (analog to digital) read node providerespective inputs to a floating point subtract node (triangular nodewith minus sign), which computes the difference between the input valuesand provides the difference as output. Below the floating point a/dread, a floating point “scale by power of 2” node (scaling node)receives inputs from a constant (−1) and a while loop left shiftregister, and outputs a scaled value, as shown. The outputs of thefloating point subtract node and the scaling node are provided as x andy inputs to a textual code block, which computes an output z=(x+1)*y. afloating point add node (triangular node with plus sign) receives thisoutput (z) and the output of the floating point subtract node as inputsand outputs the sum. A floating point multiply node (triangular nodewith “X”) receives respective inputs from a fixed point gain constantand the floating point add node and provides the resulting product to afloating point d/a write node.

As shown, the While loop also includes a timer icon representing orsignifying timing for the While loop. The timer icon includes inputs forperiod and phase. As shown, the timer icon receives a constant of 1000for the period and receives a constant of 0 for the phase. In analternate embodiment, the While loop includes input terminals which areconfigured to receive timing information, such as period and phase.

FIG. 21 illustrates the LabVIEW data structures created in response toor representing the diagram or graphical program of FIG. 20. The datastructure diagram of FIG. 20 comprises a hierarchy of data structurescorresponding to the diagram of FIG. 20, and represents portionsassigned (automatically) to respective heterogeneous hardwarecomponents, including at least one programmable communication element(which includes timing functionality). As shown, the LabVIEW datastructure representation includes a top level diagram which includes asingle signal connecting the 0 constant to the left hand shift registerof the While loop. Thus the top level diagram includes only the constant(0) and the While loop.

The While loop includes a sub-diagram which further includes left andright shift register terms, the continue flag of the While loop, aplurality of constants, a timer including period and phase inputs,global variables setpoint and gain, sub-VIs a/d read and d/a write, andvarious function icons, e.g., scale, add, subtract, and multiply.Further, each of the objects in the diagram have terminals, and signalsconnect between these terminals.

FIG. 22 illustrates a circuit diagram representing the hardwaredescription which is created in response to the data structures of FIG.21. The circuit diagram of FIG. 22 implements the graphical program ofFIG. 20. As shown, the CPU interface signals are bussed to the globalvariables. Although not shown in FIG. 22, the CPU interface signals arealso provided to the sub-VIs a/d read and d/a write.

The While loop is essentially abstracted to a control circuit whichreceives the period and phase, and includes an external enable directingthe top level diagram to execute, which starts the loop. The loop thenprovides a diagram enable(diag_enab) signal to start the loop and waitsfor a diagram done (diag_done) signal to signify completion of the loop,or the period to expire. Based on the value of the Continue flag, theloop provides a subsequent diag_enab signal or determines that the loophas finished and provides a Done signal to the top level diagram.Although not shown in FIG. 22, the loop control block also provides adiagram clear enable out (diag_clear_enab_out) signal to every node inthe sub-diagram of the While loop. Thus the loop control block outputs adiagram enable (diag_enab) signal that is fed to all of the startingnodes in the diagram within the While loop. The Done signals from theseitems are fed into an AND gate, whose output is provided to enablesubsequent nodes.

The shift register includes a data in, a data out and an enable inputwhich clocks the data in (din) to the data out (dout), and a load whichclocks the initial value into the shift register.

The following is an exemplary VHDL description corresponding to theexample of FIGS. 20-22:

library ieee; use ieee.std_logic_1164.all; entity example0 is port ( clk: in std_logic; enable_in : in std_logic; clr_enable_out : in std_logic;da_clk : in std_logic; cpu_clk : in std_logic; cpu_reset : in std_logic;cpu_iord : in std_logic; cpu_iowt : in std_logic; cpu_devsel : instd_logic; cpu_ioaddr : in std_logic_vector(31 downto 0); cpu_iodata :in std_logic_vector(31 downto 0); ad_clk : in std_logic; enable_out :out std_logic ); end example0; architecture Structural of example0 issignal sCLK : std_logic; signal sda_clk : std_logic; signal scpu_clk :std_logic; signal scpu_reset : std_logic; signal scpu_iord : std_logic;signal scpu_iowt : std_logic; signal scpu_devsel : std_logic; signalscpu_ioaddr : std_logic_vector(31 downto 0); signal scpu_iodata :std_logic_vector(31 downto 0); signal sad_clk : std_logic; signal s1AC :std_logic_vector(15 downto 0); signal s115 : std_logic; -- node 114enable_out constant cE8C : std_logic_vector(15 downto 0) :=“0000000000000000”; -- 0 signal s114 : std_logic; -- diagram done signals116 : std_logic; -- diagram clr_enable_out signal s278D : std_logic; --node 278C enable_out signal s145 : std_logic; -- node 144 enable_outcomponent shift16 port ( clk : in std_logic; enable_in, load : instd_logic; initval : in std_logic_vector(15 downto 0); din : instd_logic_vector(15 downto 0); dout : out std_logic_vector(15 downto 0)); end component; signal s1310 : std_logic_vector(15 downto 0); signals209C : std_logic_vector(15 downto 0); signal s1344 :std_logic_vector(15 downto 0); signal s1628 : std_logic_vector(15 downto0); signal s1270 : std_logic_vector(15 downto 0); signal s1684 :std_logic_vector(15 downto 0); signal s19CC : std_logic_vector(15 downto0); signal s1504 : std_logic_vector(15 downto 0); signal s149C :std_logic_vector(15 downto 0); signal sC44 : std_logic_vector(31 downto0); signal s974 : std_logic_vector(31 downto 0); signal s4D8 :std_logic; signal s2A1 : std_logic; -- node 2A0 enable_out constant c470: std_logic := ‘1’; constant c948 : std_logic_vector(31 downto 0) :=“00000000000000000000001111101000”; -- 1000constant cC04 : std_logic_vector(31 downto 0) :=“00000000000000000000000000000000”; -- 0 constant c1960 :std_logic_vector(15 downto 0) := “1111111111111111”; -- −1 signal s2A0 :std_logic; -- diagram done signal s2A2 : std logic; -- diagramclr_enable_out component write_reg port ( clk : in std_logic; enable_in: in std_logic; clr_enable_out : in std_logic; cpu_clk : in std_logic;cpu_reset : in std_logic; cpu_iord : in std_logic; cpu_iowt : instd_logic; cpu_devsel : in std_logic; cpu_ioaddr : instd_logic_vector(31 downto 0); cpu_iodata : in std_logic_vector(31downto 0); decodeaddr : in std_logic_vector(3 downto 0); data : outstd_logic_vector(15 downto 0); enable_out : out std_logic ); endcomponent; signal s5BA : std_logic_vector(3 downto 0); constant c5B8 :std_logic_vector(3 downto 0) := “00”; signal s1A7E : std_logic_vector(3downto 0); constant c1A7C : std_logic_vector(3 downto 0) := “10”; signals641 : std_logic; -- node 640 enable_out signal s39D : std_logic; --node 39C enable_out component a_d_read port ( clk : in std_logic;enable_in, clr_enable_out : in std_logic; ai_read_val : outstd_logic_vector(15 downto 0); ad_clk : in std_logic; enable_out : outstd_logic ); end component; signal s13A1 : std_logic; -- node 13A0enable_out component prim_Scale_By_Power_Of_2_16 port ( clk : instd_logic; enable_in, clr_enable_out : in std_logic; x_2_n : outstd_logic_vector(15 downto 0); x : in std_logic_vector(15 downto 0); n :in std_logic_vector(15 downto 0); enable_out : out std_logic ); endcomponent; signal s10E9 : std_logic; -- node 10E8 enable_out componentprim_Subtract_16 port ( clk : in std_logic; enable_in, clr_enable_out :in std_logic; x_y : out std_logic_vector(15 downto 0); y : instd_logic_vector(15 downto 0); x : in std_logic_vector(15 downto 0);enable_out : out std_logic ); end component; signal s14D1 : std_logic;-- node 14D0 enable_out component prim_Add_16 port ( clk : in std_logic;enable_in, clr_enable_out : in std_logic; x_y : out std_logic_vector(15downto 0); y : in std_logic_vector(15 downto 0); x : instd_logic_vector(15 downto 0); enable_out : out std_logic ); endcomponent; signal s1A01 : std logic; -- node 1A00 enable out componentprim_Multiply_16 port ( clk : in std_logic; enable_in, clr_enable_out :in std_logic; x_y : out std_logic_vector(15 downto 0); y : instd_logic_vector(15 downto 0); x : in std_logic_vector(15 downto 0);enable_out : out std_logic ); end component; signal s1725 : std_logic;-- node 1724 enable_out component d_a_write port ( clk : in std_logic;enable_in, clr_enable_out : in std_logic; a0_write_val : instd_logic_vector(15 downto 0); da_clk : in std_logic; enable_out : outstd_logic ); end component; component whileloop_timed port ( clk : instd_logic; enable_in, clr_enable_out : in std_logic; diag_enable,diag_clr_enable_out : out std_logic; diag_done : in std_logic; period :in std_logic_vector(15 downto 0); phase : in std_logic_vector(15 downto0); continue : in std_logic; enable_out : out std_logic ); endcomponent; begin s114 <= s278D AND s145; s1AC <= cE8C; nDF8: shift16port map( clk => sCLK, load => s115, enable_in => s2A0, initval => s1AC,din => s1344, dout => s19CC ); s2A0 <= s1725; s4D8 <= c470; s974 <=c948; sC44 <= cC04; s1684 <= c1960; -- setpoint n5B8: write_reg portmap( clk => sCLK, enable_in => s2A1, clr_enable_out => s2A2, enable_out=> s5B9, cpu_clk => scpu_clk, cpu_reset => scpu_reset, cpu_iord =>scpu_iord, cpu_iowt => scpu_iowt, cpu_devsel => scpu_devsel, cpu_ioaddr=> scpu_ioaddr, cpu_iodata => scpu_iodata, decodeaddr => s5BA, data =>s149C ); s5BA <= c5B8; -- gain n1A7C: write_reg port map( clk => sCLK,enable in => s2A1, clr_enable_out => s2A2, enable_out => s1A7D, cpu_clk=> scpu_clk, cpu_reset => scpu_reset, cpu_iord => scpu_iord, cpu_iowt =>scpu_iowt, cpu_devsel => scpu_devsel, cpu_ioaddr => scpu_ioaddr,cpu_iodata => scpu_iodata, decodeaddr => s1A7E, data => s1628 ); s1A7E<= c1A7C; n39C: a_d_read port map( clk => sCLK, enable in => s2A1,clr_enable_out => s2A2, ai_read_val => s1504, ad_clk => sad_clk,enable_out => s39D ); n13A0: prim_Scale_By_Power_Of_2_16 port map( clk=> sCLK, enable_in => s2A1, clr_enable_out => s2A2, x_2_n => s1270, x =>s19CC, n => s1684, enable_out => s13A1 ); s10E8 <= s39D AND s5B9; n10E8:prim_Subtract_16 port map( clk => sCLK, enable_in => s10E8,clr_enable_out => s2A2, x_y => s1310, y => s1504, x => s149C, enable_out=> s10E9 ); s14D0 <= s13A1 AND s10E9; n14D0: prim_Add_16 port map( clk=> sCLK, enable_in => s14D0, clr_enable_out => s2A2, x_y => s1344, y =>s1270, x => s1310, enable_out => s14D1 ); s1A00 <= s14D1 AND s1A7D;n1A00: prim_Multiply_16 port map( clk => sCLK, enable_in => s1A00,clr_enable_out => s2A2, x_y => s209C, y => s1344, x => s1628, enable_out=> s1A01 ); n1724: d_a_write port map( clk => sCLK, enable_in => s1A01,clr_enable_out => s2A2, a0_write_val => s209C, da_clk => sda_clk,enable_out => s1725 ); n144: whileloop_timed port map( clk => sCLK,enable_in => s115, clr_enable_out => s116, period => sC44, phase =>s974, diag_enable => s2A1, diag_clr_enable_out => s2A2, diag_done =>s2A0, continue => s4D8, enable_out => s145 ); sCLK <= clk; s115 <=enable_in; s116 <= clr_enable_out; s114 <= enable_out; sda_clk <=da_clk; scpu_clk <= cpu_clk; scpu_reset <= cpu_reset; scpu_iord <=cpu_iord; scpu_iowt <= cpu_iowt; scpu_devsel <= cpu_devsel; scpu_ioaddr<= cpu_ioaddr; scpu_iodata <= cpu_iodata; sad_clk <= ad_clk; endStructural;

Component Library

One embodiment of the present invention includes a component librarythat is used to aid in converting various primitives or nodes in agraphical program into a hardware description, such as a VHDL sourcefile. The following provides two examples of VHDL components in thiscomponent library, these being components for a While loop and amultiplier primitive.

1. While Loop Component

The following comprises a VHDL component referred to as whileloop.vhdthat the present invention uses when a While loop appears on a graphicalprogram or diagram. Whileloop.vhd shows how a While loop in a graphicalprogram is mapped to a state machine in hardware. It is noted that othercontrol structures such as a “For loop” are similar. Whileloop.vhd is asfollows:

library ieee; use ieee.std_logic_1164.all; entity whileloop is port(clk, enable_in, -- start loop execution clr_enable_out -- reset loopexecution : in std_logic; diag_enable, -- start contained diagramexecution diag_clr_enable_out -- reset contained diagram execution : outstd_logic; diag_done, -- contained diagram finished continue --iteration enabled : in std_logic; enable_out -- looping complete : outstd_logic ); end whileloop; architecture rtl of whileloop is typestate_t is (idle_st, -- reset state test_st, -- check for loopcompletion calc_st, -- enable diagram execution end_st -- assert enableout  ); signal nstate,state : state_t; beginprocess(state,enable_in,clr_enable_out,diag_done,continue) begindiag_clr_enable_out <= ‘0’; diag_enable <= ‘0’; enable_out <= ‘0’; casestate is when idle_st => diag_clr_enable_out <= ‘1’; if enable_in=‘1’then nstate <= test_st; else nstate <= idle_st; end if; when test_st =>diag_clr_enable_out <= ‘1’; if continue=‘1’ then nstate <= calc_st; elsenstate <= end_st; end if; when calc_st => diag_enable <= ‘1’; ifdiag_done=‘1’ then nstate <= test_st; else nstate <= calc_st; end if;when end_st => enable_out <= ‘1’; nstate <= end_st; end case; -- Becauseit appears at the end of the process, this test -- overrides anyprevious assignments to nstate if clr_enable_out=‘1’ then nstate <=idle_st; end if; end process; process(clk) begin if clk′event andclk=‘1’ then state <= nstate; end if; end process; end rtl;

2. Multiplier Primitive Component

The following comprises a VHDL component referred to asprim_multiply_(—)16.vhd that the present invention uses when amultiplier primitive appears on a graphical program or diagram. Byfollowing the path from enable_in to enable_out, it can be seen how theself-timed logic works—each component asserts enable_out when the dataoutput is valid. Other primitives like “add” or “less than” operate in asimilar manner. Prim_multiply_(—)16.vhd is as follows:

library ieee; use ieee.std_logic_1164.all; entity prim_multiply_16 isport( clk : in std_logic; enable_in : in std_logic; clr_enable_out : instd_logic; x_y : out std_logic_vector(15 downto 0); x : instd_logic_vector(15 downto 0); y : in std_logic_vector(15 downto 0);enable_out : out std_logic ); end prim_multiply_16; architecture alteraof prim_multiply_16 is COMPONENT lpm_mult  GENERIC (LPM_WIDTHA:POSITIVE; LPM_WIDTHB: POSITIVE; LPM_WIDTHS: POSITIVE; LPM_WIDTHP:POSITIVE; LPM_REPRESENTATION: STRING := “UNSIGNED”; LPM_PIPELINE:INTEGER := 0; LPM_TYPE: STRING := “L_MULT”  );  PORT (dataa: INSTD_LOGIC_VECTOR(LPM_WIDTHA−1  DOWNTO 0); datab: INSTD_LOGIC_VECTOR(LPM_WIDTHB−1 DOWNTO 0); aclr: IN STD_LOGIC := ‘0’;clock: IN STD_LOGIC := ‘0’; sum: IN STD_LOGIC_VECTOR(LPM_WIDTHS−1 DOWNTO0) := (OTHERS => ‘0’); result: OUT STD_LOGIC_VECTOR(LPM_WIDTHP−1 DOWNTO0)); END COMPONENT; signal 1_x,1_y : std_logic_vector(15 downto 0);signal 1_xy : std_logic_vector(31 downto 0); signal 1_enable_in :std_logic; begin -- synchronize the incoming and outgoing data toguarantee -- a registered path on data through the multiplier --register enable_out so it won't assert before data is -- available.process(clk) begin if clk′event and clk=‘1’ then if clr_enable_out=‘1’then enable_out <= ‘0’; 1_enable_in <= ‘0’; else enable_out <=1_enable_in; 1_enable_in <= enable_in; end if; 1_x <= x; 1_y <= y; x_y<= 1_xy(15 downto 0); end if; end process; gainx: lpm_mult  GENERIC map(LPM_WIDTHA => 16, LPM_WIDTHB => 16, LPM_WIDTHS => 1, LPM_WIDTHP => 32,LPM_REPRESENTATION => “UNSIGNED”, LPM_PIPELINE => 0 )  PORT map( dataa=> 1_x, datab => 1_y, result => 1_xy ); end altera;

FIGS. 23-25—Exemplary Graphical Source Code

FIGS. 23-25 illustrate exemplary graphical source code listings of agraphical program, according to one embodiment. It should be noted thatthe graphical program source code shown is exemplary only, and is notintended to limit the graphical programs contemplated to any particularform, function, or appearance.

Acceleration of Simulations and Other Computationally Intensive Tasks:

The present techniques are broadly applicable to the field of textual orgraphical data flow programming of heterogeneous hardware components(HHC) using floating-point constructs for real-time,faster-than-real-time and slower-than-real-time simulation, digitalsignal processing, algorithms, mathematics, optimization, artificialintelligence, search and other compute intensive tasks, includingapplications in the field of system simulation, e.g., multi-physicssimulation of a system such as a circuit, electric power grid, motor,generator, power inverter, power converter, electromagnetics,communication network, system of actors, or other complex physicalsystem, including computationally irreducible systems along withembedded software code and sets of configuration parameters associatedwith the system simulation, e.g., control software, analysis software ordigital signal processing software.

As discussed above in detail, the parallel, floating-point program orgraphical program, e.g., graphical data flow program or diagram, may beautomatically assigned to configure a heterogenous hardware element orsystems of heterogeneous hardware elements including internal andexternal communication and timing constraints for these purposes. Inother words, the simulation may be represented using graphicalprogramming, textual programming, or a combination of graphical, textualand other representations. The configured programmable hardware elementmay implement a hardware implementation of the program, includingfloating-point math functionality. The present techniques may alsoinclude graphical data transfer and synchronization mechanisms thatenable a plurality of targets executing graphical floating-point math tosimulate complex physical systems in which measurements, state-values,inputs, outputs and parameters may be shared between targets andrepresented using graphical floating-point programming constructs suchas nodes, functions and wires. In some embodiments, the simulationmathematics may be represented graphically in a plurality of formats andstructures including, but not limited to, state-space, nodal analysis,differential equations, algebraic equations, differential algebraicequations, state-charts, look up tables, descriptive CAD drawings orvisual system representations, or finite element analysis. Multipleinstances of the simulation mathematics may be executed concurrently,i.e., in parallel, on HHCs with populations of identical or varyingconfiguration parameters, states, or simulation mathematics.

In some embodiments, while the real-time or faster-than-real-timesimulation is executing on the HHCs, feedback may be incorporated in anopen loop or closed loop manner based, for example, on data fromphysical measurements such as phasor-measurement units or otherinstruments related to the system being simulated, other simulations,user interface events, or events driven automatically based on the stateof the simulation. The simulation timestep may fixed or variable, andmay be negotiated automatically among the HHC, systems of HHCs, externalsimulators and input/output mechanisms such as external instrumentationsystems, sensors or user interfaces (see, e.g., U.S. patent applicationSer. No. 13/347,880, titled “Co-Simulation with Peer Negotiated TimeSteps”, which was incorporated by reference above). Internal or externalinformation may also be used to inform or transform the state of thesimulation. The HHC based simulator may have the ability toautomatically switch in a “bumpless” manner between various modelrepresentations and look-up-table datasets, which may represent thesystem in different configurations or may represent the system withdifferent levels of fidelity.

In this way, embodiments of the present techniques may enable automatedhardware acceleration of simulations and other computationally intensivetasks using a (possibly graphical) programming environment and floatingpoint math on HHCs.

Global Optimization of a Program Targeted to Heterogeneous ProgrammableHardware

The techniques disclosed herein may also be applied to globaloptimization of complex programs. The following describes optimizationof a program, e.g., a graphical program, or a textual program, withfloating point math functionality, and targeted for deployment to asystem with heterogeneous hardware components, according to someexemplary embodiments.

For example, in some embodiments, mathematical optimization techniquesand algorithms, including global optimization techniques, may be used incombination with floating point math for computing the value of afunction or simulation by execution of the floating point math on HHCs.Thereby, given user defined goals and constraints, a design spacerepresented using graphical floating point math may be automaticallyexplored for the purpose selecting or synthesizing one or more of: anoptimal set of parameters, component values, software tuning parameters,alternative system designs and circuit topologies, alternative models ormodel representations, combinations, curve fitting coefficients,calibration parameters, component lifetime, system reliability, marginof safety, cost, time, path length, resources, circuit design, designsynthesis, planning, logistics, and/or manufacturing options, amongothers. Such exploration of the design space may provide means toevaluate a plurality of non-linear design tradeoffs from a set ofsimulated or mathematically modeled alternatives using measurements froma simulated or physical system that is parameterized, modeled, orotherwise configured using (possibly graphical) floating point mathexecuting in programmable hardware elements.

Moreover, in some embodiments, optimization, search, decision, andBayesian probabilistic techniques, implemented using textual, graphicalprogramming, or other methods, may be integrated with the high speed,parallel execution of floating-point data flow math on reconfigurablehardware targets, which is needed to grapple with complex non-linear,multi-domain design tradeoffs including non-deterministicpolynomial-time hard (NP-hard) problems and computationally irreducibleproblems. For example, as applied to the design of power converters forrenewable energy, electric vehicle and smart grid applications, thesetechniques may enable the designers of these complex, multi-physics,networked systems to optimize for multiple design goals simultaneously,including, for example, one or more of: energy efficiency, cost,component lifetime, systematic reliability, regulatory compliance,interoperability and compatibility, and other differentiating productfeatures as necessary to increase the performance-per-dollar and otherpositive attributes of next generation renewable energy systems.

In various embodiments, the optimization techniques may includeevolutionary algorithms, neural or fuzzy algorithms capable of searchingcomplex non-linear systems containing multiple variables, complexmathematics, or multiple design constraints, among others. Multipleparallel floating-point simulations of the system may be executed on theHHCs which may be fed populations of identical or varying configurationparameters, states, or simulation mathematics by the global optimizationroutine.

In this way, high order, non-linear design spaces may be explored usinghardware acceleration to identify “global optimal” choices oftopologies, component choices, control software tuning gains, and soforth.

Globally Optimal Inverter Designs

The global optimization of power inverter and control software designsinvolving multiple variables with non-linear tradeoffs is extremelycomputationally intensive, and so the technology has previously beenlimited to relatively simple systems. However, real-time andfaster-than-real-time power electronics and grid simulation technologiesmade possible by the present techniques, e.g., using newly introducedfloating point math capabilities and heterogeneous SOCs containing a mixof DSP cores, FPGA fabric and microprocessors, facilitates globaloptimization of more complex system optimization. One particularapproach utilizes new global optimization algorithms based on atechnique called “differential evolution” that is capable of dealingwith complex non-linear systems containing multiple “false positive”solutions and multiple design constraints.

For example, consider the problem of finding a globally optimal designfor an electric motor or magnetic levitation half-bridge IGBT invertercontrol system, such as that shown in FIG. 26. The goal may be to designan inverter with the best performance, highest energy efficiency,longest component lifetime and minimum cost. There are constraints basedon the temperature, voltage and current limitations of the IGBTs. Thefirst goal may be to optimize the power electronics circuit design andthen the control software tuning to achieve a globally optimal resultthat spans the boundary between the multi-physics (electro-thermal)circuit design and the embedded software design. To do this, the circuitdesign may be exported to a development environment, e.g., LabVIEWFPGA™, to create multiple parallel floating-point simulations of thesystem, and the global optimization routine may execute the simulationsin parallel until the design space has been fully or at least adequatelyexplored, where the various parameters defining the design space may bevaried over the different simulations, and the correspondingperformance, energy efficiency, component lifetime, and cost for eachsimulated system compared to determine the optimum solution.

Of course, these techniques may be applied to any type of systemsimulation as desired.

Although the embodiments above have been described in considerabledetail, numerous variations and modifications will become apparent tothose skilled in the art once the above disclosure is fully appreciated.More specifically, it should be noted that any combinations of the abovetechniques and elements may be used as desired. It is intended that thefollowing claims be interpreted to embrace all such variations andmodifications.

We claim:
 1. A non-transitory computer accessible memory medium thatstores program instructions for configuring a system of heterogeneoushardware components, wherein the program instructions are executable bya processor to: create a program that includes floating point mathfunctionality, wherein the program comprises a plurality ofinterconnected nodes that visually indicate functionality of theprogram, wherein the program is targeted for distributed deployment on asystem comprising heterogeneous hardware components, including at leastone programmable hardware element, at least one digital signal processor(DSP) core, and at least one programmable communication element (PCE);automatically determine respective portions of the program fordeployment to respective ones of the heterogeneous hardware components,including automatically determining respective execution timing for therespective portions; automatically generate first program codeimplementing communication functionality between the at least oneprogrammable hardware element and the at least one DSP core, wherein thefirst program code is targeted for deployment to the at least oneprogrammable communication element; and automatically generate at leastone hardware configuration program from the program and the firstprogram code, wherein said automatically generating comprises compilingthe respective portions of the program and the first program code fordeployment to respective ones of the heterogeneous hardware components;wherein the hardware configuration program is deployable to the system,wherein after deployment, the system is configured to execute theprogram concurrently, including the floating point math functionality.2. The non-transitory computer accessible memory medium of claim 1,wherein the system comprises a heterogeneous system on a chip.
 3. Thenon-transitory computer accessible memory medium of claim 1, wherein thesystem comprises a heterogeneous system implemented on multiple chips.4. The non-transitory computer accessible memory medium of claim 1,wherein the program instructions are further executable to:automatically deploy the hardware configuration program to the system.5. The non-transitory computer accessible memory medium of claim 1,wherein the heterogeneous hardware components further include at leastone microprocessor.
 6. The non-transitory computer accessible memorymedium of claim 1, wherein the heterogeneous hardware components furtherinclude at least one at least one graphics processing unit (GPU).
 7. Thenon-transitory computer accessible memory medium of claim 1, wherein theat least one PCE comprises one or more PCEs for internal communicationsbetween the at least one programmable hardware element and the at leastone DSP core.
 8. The non-transitory computer accessible memory medium ofclaim 1, wherein the at least one PCE comprises at least one I/O blockfor communications between the at least one programmable hardwareelement or the at least one DSP core and external components or systems.9. The non-transitory computer accessible memory medium of claim 1,wherein the system comprises one or more chips, and wherein the at leastone PCE is configurable for intra-chip communications or inter-chipcommunications.
 10. The non-transitory computer accessible memory mediumof claim 1, wherein to create the program, the program instructions areexecutable to: generate the program based on one or more of: at leastone text-based program; at least one simulation or model; at least onecircuit diagram; at last one network diagram; or at least onestatechart.
 11. The non-transitory computer accessible memory medium ofclaim 1, wherein the program comprises a data flow program.
 12. Thenon-transitory computer accessible memory medium of claim 1, wherein theprogram comprises a plurality of data transfer and synchronizationmechanisms represented by floating-point programming nodes, functions,and wires, wherein the data transfer and synchronization mechanisms aredeployable to the heterogeneous hardware components, thereby enablingthe heterogeneous hardware components implementing the floating-pointmath functionality to simulate physical systems in which measurements,state-values, inputs, outputs and parameters are shared between theheterogeneous hardware components.
 13. The non-transitory computeraccessible memory medium of claim 1, wherein the program comprisesmultiple models of computation.
 14. A method for configuring a system ofheterogeneous hardware components, the method comprising: creating aprogram that includes floating point math functionality, wherein theprogram comprises a plurality of interconnected nodes that visuallyindicate functionality of the program, wherein the program is targetedfor distributed deployment on a system comprising heterogeneous hardwarecomponents, including at least one programmable hardware element, atleast one digital signal processor (DSP) core, and at least oneprogrammable communication element (PCE); automatically determiningrespective portions of the program for deployment to respective ones ofthe heterogeneous hardware components, including automaticallydetermining respective execution timing for the respective portions;automatically generating first program code implementing communicationfunctionality between the at least one programmable hardware element andthe at least one DSP core, wherein the first program code is targetedfor deployment to the at least one programmable communication element;and automatically generating at least one hardware configuration programfrom the program and the first program code, wherein said automaticallygenerating comprises compiling the respective portions of the programand the first program code for deployment to respective ones of theheterogeneous hardware components; wherein the hardware configurationprogram is deployable to the system, wherein after deployment, thesystem is configured to execute the program concurrently, including thefloating point math functionality.
 15. The method of claim 14, whereinthe system comprises a heterogeneous system on a chip.
 16. The method ofclaim 14, wherein the system comprises a heterogeneous systemimplemented on multiple chips.
 17. A non-transitory computer accessiblememory medium that stores program instructions for configuring a systemof heterogeneous hardware components, wherein the program instructionsare executable by a processor to: create a program that includesfloating point functionality, wherein the program is targeted fordistributed deployment on a system comprising heterogeneous hardwarecomponents, including: at least one programmable hardware element; atleast one digital signal processor (DSP) core; at least one programmablecommunication element (PCE); automatically determining respectiveportions of the program for respective deployment to the heterogeneoushardware components, including automatically determining respectiveexecution timing for the respective portions, wherein the respectiveportions comprise: a first portion targeted for deployment to the atleast one programmable hardware element; and a second portion targetedfor deployment to the at least one DSP core; automatically generatingprogram code implementing communication functionality between the atleast one programmable hardware element and the at least one DSP core,using the at least one communication element; automatically generatingat least one hardware configuration program from the program, including:compiling the first portion of the program for deployment to the atleast one programmable hardware element, thereby generating a firstportion of the at least one hardware configuration program; compilingthe second portion of the program for deployment to the at least one DSPcore, thereby generating a second portion of the at least one hardwareconfiguration program; compiling the automatically generated programcode implementing communication functionality for deployment to the atleast one communication element, thereby generating a third portion ofthe at least one hardware configuration program; wherein the hardwareconfiguration program is deployable to the system, including:configuring the at least one programmable hardware element with thefirst portion of the at least one hardware configuration program;configuring the at least one DSP core with the second portion of the atleast one hardware configuration program; and configuring the at leastone communication element with the third portion of the at least onehardware configuration program; wherein after deployment, the system isconfigured to execute the program concurrently, including the floatingpoint functionality, wherein during execution: the at least oneprogrammable hardware element performs the functionality of the firstportion of the program; the at least one DSP core performs thefunctionality of the second portion of the program; and the at least onecommunication element implements communication between the at least oneprogrammable hardware element and the at least one DSP core.
 18. Thenon-transitory computer accessible memory medium of claim 17, whereinthe system comprises a heterogeneous system on a chip.
 19. Thenon-transitory computer accessible memory medium of claim 17, whereinthe system comprises a heterogeneous system implemented on multiplechips.
 20. The non-transitory computer accessible memory medium of claim17, wherein the program comprises a data flow program.